Assessing the Vulnerability of California Water Utilities to Wildfires

Wildfires are becoming more frequent and destructive in California, and it is essential to quantify their potential impacts on drinking water utilities. This study aims to measure the severity of wildfires in each California water utility based on the exposure frequency and the extent of area burned by wildfires in each service area. Our quantitative models show an association between water utility characteristics and their vulnerability to wildfires. Findings indicate that wildfire vulnerability is higher in government-owned utilities than private ones, utilities primarily relying on surface water than groundwater, and utilities using local-sourced water than purchased water. Also, we find a stronger association between wildfire vulnerability and large utilities in terms of population served than small or medium ones. Regarding geography, we find wildfire vulnerability is higher in southern and coastal California utilities than in Northern and inland California. These results help water utilities and land managers identify vulnerable locations and develop wildfire management and disaster preparedness strategies.


Introduction
Over the past several decades, climate change, extensive droughts, population growth, urbanization, and land-use changes have raised wildfire risks and severity throughout the western US. In recent years, California has experienced a substantial increase in the

Data Description
Our study focuses on California's 7,500 water utilities throughout the state. Our primary study data is from California wildfires that occurred from 1991 to 2020 within the boundaries of water utility service areas. Our final dataset for the analysis combines (1) spatial-temporal data on wildfires obtained from the California Fire and Resource Assessment Program (FRAP) (2021) and the National Interagency Fire Center (2021); data on approximately 7,500 water utility boundaries, 3 and characteristics acquired from the California State Water Resources Control Board, the US Environmental Protection Agency, and California Department of Water Resources. We used ArcGIS 10.6 and programming languages to spatially merge these data to the water utility and generate outcome variables.
Our final pooled cross-sectional data contains information on wildfire-related factors used as an outcome variable and on characteristics of water utilities used as explanatory variables (Table 1). The wildfire-related factors that the FRAP provides include the date of wildfire and its name, location, causes, acres burned, and the date of fire suppression. This data is an annual digital record of fire perimeters in California generated by compiling fire perimeters and establishing an ongoing fire perimeter data capture process. Upon release publicly, the data is current as of the last calendar year and is limited to fires already suppressed by the FRAP.
Using these factors, we measure the historical fire exposure for each water utility service area based on the number and acres of historical wildfires in each utility service area and Table 1 Description for the selected variables a A small-populated level is when utilities serve less than 3,300 people; a medium-populated level is when utilities serve less than 100,000 people; and a large-populated level denotes when utilities serve more than 100,000 people

Outcome Variables Description
Exposure frequency Total number of wildfires that occurred for the specific period (number counted) Total area burned Summarized total land acres burned for the specific period (acres) Severity Total land acres burned divided by the total number of events that occurred (1,000 acres) (i.e., wildfire extent/wildfire exposure frequency Log severity Taking the logarithm of the severity

Explanatory Variables Description
Ownership Local government-owned utility (1 = government, 0 = private) Primary source Utility relying primarily on groundwater as a primary water source (1 = groundwater, 0 = surface water) Trade Utility that is mainly dependent on purchased water (1 = purchased, 0 = owned) 1st geographic location Utility located in Southern California (1 = SoCal, 0 = NoCal) 2 nd geographic location Utility located in coastal California (1 = coastal, 0 = inland) Population size a Population size served by an utility (i.e., 0 = small-populated level, 1 = medium-populated level, 2 = large-populated level utility) measure the wildfire severity. We expect that wildfire severity varies by each utility's characteristics, thereby assessing their vulnerability to wildfires. The raw historical data of wildfires is available from 1878 to 2020. To examine different effects by different periods according to one of our research questions, we created four subsample datasets from the latest data point, which classify: the last 30-year period (1991-2020), last 20-year period (2001-2020), last 10-year period (2011-2020), and last 5-year period (2016-2020). We combine each of these fire-related datasets with data on 4,957 water utilities. Along with each subsample, we use data on a set of water utility characteristics such as ownership (local government versus private sector), the primary water source (groundwater versus surface water), and water imports (purchased water versus local sources of water), geographic location (Northern versus Southern California, and inland versus coastal California), and population size served by utilities (small, medium, and large). More specifically, we defined population size served by utilities as (1) a small-populated level when utilities serve less than 3,300 people, (2) a medium-populated level when utilities serve less than 100,000 people, and (3) a large-populated level when utilities serve more than 100,000 people. Table 2 provides the descriptive statistics for the selected variables by different periods. Note that each subsample data composed of the same outcome variable and explanatory variable show missing values on the FRAP survey or zero values because there were no wildfires.

Methodology
The wildfire survey has the nature of non-periodic outbreaks; accordingly, our data include missing values (i.e., no survey implemented) and zero values (i.e., no wildfire occurred at all). Since wildfire severity takes a positive censored distribution starting from zero, the estimation without considering censoring brings bias to the estimated coefficients from regression analysis using the Ordinary Least Squares (OLS) estimator. In a similar vein, when analyzing the characteristics of water utilities that determine wildfire vulnerability, the data contains two kinds of information: (1) whether wildfire occurs and (2) the severity of wildfire. The result may lead to a biased inference if OLS regression analysis is performed on only the latter information. That is because it is not a random selection without controlling the cases with no wildfire outbreaks. Therefore, when the range of values taken by the outcome variable is limited and complex information is embedded, an analysis model that considers these features must be applied. Considering these, we applied a Tobit model 4 using one of the maximum likelihood estimations (MLE) and a Heckman model using a two-step procedure 5 for the analysis. Both models have the advantage of bias adjustment, but the Tobit model is based on the assumption that the effects of explanatory variables on the occurrence of wildfire and the magnitude of wildfire are identical. The Heckman model analyzes two separate steps-whether wildfire occurs and the severity of wildfire. As mentioned in the data section, using the same set of explanatory variables, we made separate estimates for wildfire records from 1991 to 2020 by classifying them into four different periods (i.e., during a 5-year, 10-year, 20-year, and 30-year period) to compare how the wildfire severity varies with the different periods.

Tobit Model
The standard method to correct a limited outcome variable is to estimate the Tobit model by MLE. Given that our outcome variable, wildfire severity, cannot be negative, and that it is a continuous variable over a specific period, we use the Tobit model censored at zero to explain the heterogeneity of water utility characteristics affecting wildfire severity. We take the log of the outcome variable not only for dealing with skewness and getting closer to a normal distribution but also for linearizing the non-linear relationship between the outcome and explanatory variable. Since wildfire at zero cannot have log transformation, the natural logarithm was taken after adding one to all observations (MaCurdy and Pencavel 1986). By doing this, the value of one is converted to zero again when applying the log, so it does not distort the meaning that no wildfire was observed or that no wildfire occurred in the analysis.
This wildfire severity is dependent on the relevant influencing factors, and the Tobit model shows the effect on the wildfire's occurrence and the effect on its severity by these factors are equal and estimates the wildfire severity as a single function as: where the variables y i and y * i are receptively observable wildfire severity and a latent wildfire severity (i.e., it is an unobservable and random variable whose realized value is hidden) at each utility i (i.e., n has a total of 4,957 water utilities 6 ). The variable x i denotes utility characteristics, such as ownership (local government versus private sector), a primary source (groundwater versus surface water), trade (whether a utility relies on purchased water or its own local sources), geographic location (Northern versus Southern California, and inland versus coastal California), and population size served by utilities (small-populated, mediumpopulated, and large-populated level). The variable i is normally distributed error term having a mean of 0 and a standard deviation of 2 to capture random influences on the relationship between explanatory and outcome variables. The term represents a parameter to be estimated in Tobit that is censored from below the threshold value ( = 0 in this analysis): This Eq. (2) describes the probability that wildfire is greater than 0, the conditional expectation of the latent outcome variable y * i is given as: where Φ and ∅ denote the standard normal cumulative distribution function (CDF) on x � i ∕ and the corresponding probability density function (PDF), respectively. Accordingly, the parameters and are derived by maximizing the following log-likelihood for each water utility i of the one-limit Tobit model is (Greene 2003): This maximum log-likelihood function when the threshold value = 0 is the sum of such log-likelihood for each utility obtained by taking the logarithm after combining Eqs. (2) and (3), and by maximizing it through the convergence process by repeatedly operating to get consistent estimates of and σ.

Hackman Model
An alternative method, Hackman's two-step procedure model is expressed as: (2) where the variable z * i denotes an observable variable on whether a wildfire occurs and y * i a latent variable on wildfire severity. The variables w i and x i mean on the explanatory variables of utility characteristics as stated in the Tobit model. The variables u i and i are error terms following a bivariate normal distribution with a mean of 0 and a standard deviation of 1 and . The correlation between the two error terms to be estimated denotes (i.e., corr u i , i = ), and it implies that the error terms are not independent.
Since the variable y * i can be observed when the variable z * i has a positive value, when targeting the cases of z * i = 1 from the Eq. (5), the conditional expectation of y * i is yielded as (Greene 2003): where Φ(⋅) and ∅(⋅) reactively denote CDF and PDF, and λ i (⋅) ≡ ∅(⋅)∕Φ(⋅) is a selection parameter called inverse Mills ratio (IMR) or selection hazard. This represents the instantaneous probability that each observation will be excluded from the sample conditional on being at the hazard (Bushway et al. 2007).
Using Eq. (7) the estimation steps are as follows: In the 1 st step, using a Probit regression to model the sample selection process in Eq. (5), estimates ̂ and calculates λ i (−w � î ). In the 2 nd step, λ i (−w � î ) is added to a regression, based on Eq. (6) as a regressor along with the other explanatory variables, x i by adjusting the standard error and t value. This OLS is used to provide the consistent parameter estimates of both and in . Thus, despite ≠ 0 , estimating wildfire severity with data considering only z * i = 1 , a sample selection bias of λ i −w � i occurs; resultantly, the effect of a specific explanatory variable on the outcome variable may be underestimated or overestimated.

Wildfire Severity and Water Utilities
Maps of calculated wildfire severity by the period in the water utilities' service area boundaries in California are presented in Fig. 1. As indicated before, we define the wildfire severity in each water utility service area by dividing the total land acres burned (i.e., wildfire extent) by the total number of events that occurred (i.e., wildfire exposure frequency). The wildfire severity can have a neutralizing effect on "exposure frequency," which can be used to identify vulnerable locations and the degree of vulnerability to wildfires more effectively. Otherwise, displaying raw counts rather than relative values, such as rates per unit (here, acres burned per event), might cause a misleading emphasis on bigger geographic areas. Thus, we decided that it is better to examine the utilities' vulnerability to wildfires by calculating the proportion of the phenomenon measured in severity. Each map illustrates the severity of exposure to wildfire risk within and near the utility's service area boundaries through the severity that an area burned over a 5-to 30-year period. The severity of (7) wildfires in the last 5 years has increased compared to the last 10, 20, and 30 years. This results from a combination of high temperatures and drought due to the increasing influence of climate change (Bladon et al. 2014). These maps will assist the state's water managers, policymakers, and water utilities in determining the locations that need to implement special management to mitigate wildfire risk.  Table 3 shows the estimates obtained from the OLS, Tobit, and Heckman models for the association of water utility characteristics and wildfire severity. Statistical signs of all explanatory variables except population size from the OLS model are consistent for the three models, but their statistical coefficients and levels are different.

Overall Regression Results
In summary, all coefficient estimates from the Tobit model, handling ordinary statistical problems in which the data is subject to censoring, are statistically significant and have the expected signs at either the 10% or 1% level. All coefficients of each explanatory variable from the OLS model have smaller absolute terms and are statistically less significant (i.e., government-owned utility, 0.205 to large-populated level utility, 0.305) than those of the Tobit model (i.e., government-owned utility, 1.225 to large-populated level utility, 1.638). Likewise, the Heckman model as a treatment for the sample selection bias also has expected signs and is statistically more significant in the selection equation but less significant in the case of the outcome equation than the OLS model. All coefficients from the Heckman model have relatively greater absolute terms than the OLS model.
All coefficients of the selection equation determining whether or not there are wildfires in the Heckman model are estimated to be smaller than those of the Tobit model at a high significance level-government-owned utility, 0.329 (Tobit 1.225) to large-populated level utility, 0.779 (Tobit 1.638). Yet, the outcome equation implying wildfire severity was instead the opposite. All coefficients are estimated to be larger than those of the Tobit model. None were statistically significant-government-owned utility, 2.192 (Tobit 1.225) to utility size, 3.620 (Tobit 1.638). This indicates that compared to the Tobit model, the Heckman model is valid for estimating the effect of the explanatory variables determining the presence or absence of a wildfire. However, it is not valid for estimating the explanatory variables' impact for determining a wildfire's severity. The following subsections explain the Tobit and Heckman model results, respectively.

Estimated Results for Tobit Model
First, in reviewing the results from censored Tobit regression estimation in Table 3, the coefficients for characteristics of a water utility such as ownership--local government versus privately owned utilities--reveal a comparison of the vulnerability of wildfires according to ownership. This indicates that while holding other variables in the model constant, water utilities that the government owns compared to those that are privately owned have a greater wildfire severity. This result is statistically significant at the 1% level.
Second, our results show that utilities relying on groundwater as their primary source have a statistically significantly lower rate of wildfire severity (i.e., less vulnerable to wildfires) than those depending on surface water, all else equal. Degradation of water quality from wildfire has the potential to have adverse effects on both surface and groundwater. However, their impacts on surface water are more evident that those on groundwater (Mansilha et al. 2020). This is due to groundwater's hydrogeographic position, which is hard to see and measure. Wildfires can produce significant water quality changes that may impact many of California's reservoirs and associated water infrastructure supplying community drinking water. These impacts are created and accumulated due to pollutants after the wildfire, including black ashes, debris, sediment, and chemicals used to fight the fire. Post-fire runoff provides a route to carry these pollutants to surface water, which may lead to the substantial water quality of water available impacts. Loaded pollutants along the route can fill drainage channels, reduce reservoir holding capacities, and add contaminants to the existing water infrastructure (Moody et al. 2013;Abraham et al. 2017;Murphy et al. 2020). Third, provided the other variables are held constant, we find utilities mainly dependent on purchased water are less vulnerable to wildfires than utilities dependent on their local sources. These results are statistically significant at the 1% level. Fourth, utilities in Southern California have a higher severity of wildfires relative to those in Northern California. Predictably, the result is due to different weather conditions that result in less rain and hot and dry weather in Southern California. Higher temperatures and drier conditions, such as heatwaves, often exacerbate wildfires. Usually, in response to wildfires that threaten life, property, and high-value resources, fire retardants are used to slow the spread of wildfires, thus allowing time for firefighters to respond to the incident. However, it is difficult to expect these retardants to prevent the spread of parched land due to high temperatures and severe drought. Consequently, it is more likely that Southern California, with landscapes that are largely semi-arid with Mediterranean ecosystems and significant chaparral shrublands, 7 is a more wildfire-prone environment than Northern California (Barro and Conard 1991;Syphard et al. 2018).
Fifth, utilities in coastal California have a higher severity of wildfires than those in inland California. The more intense wildfire outbreaks in coastal California are due to climate trends that make it easy for wildfires to spark. That is because offshore, downslope winds with low humidity blow from inland areas toward the coast. Burnt trees are like charcoal, so the embers last for a long time, and the wind spreads sparks from those embers into neighboring areas. The lower the humidity and the stronger the wind, the faster this diffusion and the greater the distance the fire sparks will travel (Goss et al. 2020). Largescale wildfires in coastal California have increased by about 10% per decade since 1984 due to climate trends (Zhongming et al. 2020). These trends push existing fires into urban communities. This phenomenon causes more damage when wildfires occur near densely populated areas (e.g., Bay Area, Los Angeles, San Bernardino, and San Diego counties).
In terms of population size served by a utility, results show how much more (or less) likely utilities serving a larger population are vulnerable to wildfires than utilities serving a smaller population. The increase in population size causes an incremental increase in wildfire severity from each utility. Specifically, the increase of population size (i.e., serving less than 100,000 people) from a utility supporting a medium-population, and a utility supporting a large population (i.e., serving more than 100,000 people) result in an incrementally growing severity, compared to a utility supporting a small-population (i.e., serving less than 3,300 people). Such results are all statistically significant at either the 5% or 1% level. Utilities that provide services to a larger population are often large and governmentowned. These larger-population areas may also increase the number of wildfires caused by inadvertent human actions.
Lastly, value shows that it is reasonable to choose the Tobit model to handle the case in which the severity of the wildfire is less than or equal to zero. The value was found to be statistically significant at 18.439 (p < 0.001), indicating that the Tobit model is a valid method. The Tobit model indicates the effects of all explanatory variables on wildfire severity and remains statistically significant.
The coefficients from the OLS model show the marginal effect in which average changes of the outcome variable are given with explanatory variable changes. However, the coefficients from the Tobit model have average changes in the outcome variable and the probability that the changes will be observed. Thus, to get the marginal effect of an explanatory variable on the observed outcome variable, it is necessary to multiply the coefficient by the probability that the observed variable is greater than zero (Amore and Murtinu 2021). The value of the marginal effect is also presented alongside the coefficient in Table 3.

Estimated Results of Hackman Model
We estimated two equations in Heckman's two-step procedure, with the estimations from the Probit model as a selection equation at the first step and the OLS model as an outcome equation at the second step. The overall results of the Heckman model in Table 3 show no difference in the direction of the effect explanatory variables have on the outcome variable between the first Probit selection model and the second linear probability model.
The selection equation determines whether or not there are wildfires and its coefficients' signs on all explanatory variables are consistent with those from Tobit's outcomes, but they are estimated smaller than the Tobit model and are more statistically significant. Thus, the effects of heterogeneous characteristics of each water utility on the severity of wildfires would be the same as what the Tobit model has. The outcome equation describes the effect of the explanatory variables on the wildfire's severity. The statistical signs from the coefficients on all explanatory variables are still equivalent to the Tobit model, yet they were not statistically significant. This means that water utilities' characteristics have a significant impact on whether a wildfire occurs, but not on its severity.
Notice that the coefficient of Lambda ( λ i ), denoting IMR, is insignificant with t-value = 1.40 (at p > 0.1). This suggests for our analysis that selection bias is not a significant issue. Given the meaning of λ i as a self-selectivity correction factor, this result shows that selection bias is not a significant issue for our analysis, indicating that the Tobit model would give us a better explanation in the wildfire severity than the two-step Heckman model. Furthermore, the Tobit model is a less-complicated statistical approach. For another parameter , when it is positive, this indicates the unobservable factors that cause a wildfire's severity are positively correlated with one another; conversely, negative indicates the unobservable factors are negatively correlated. The error terms for ρ in Table 3 show the unobserved factors that cause wildfire severity are positively correlated.

Estimated Results for Different Time Periods
Tobit model estimates for wildfire severity presented in Table 4 show changes in wildfire exposure over time. These variations reflect the base period vulnerability of water utilities to wildfire. Since all estimates for each period have the same statistical signs, the direction of the effects--either increase or decrease--from the characteristics of water utility on the severity of the wildfires will be the same. These results are the same as those described in the sections above. These results are the same in statistical significance at either the 5% or 1% level, except for features about population size. However, it can be seen that the degree of the effect differs according to the period, as the coefficients are different by each period.
Although the point estimates of the effects vary according to the period, it is clear that the overall effects are larger in a short period (the largest point estimates are from the 5 years). The longer the period, the less fluctuation degree in severity estimated despite the greater frequency of accumulated wildfires, implying that the area burned by the wildfire is not that large. In other words, since the number of occurrences of smallscale wildfires accounts for a large proportion, the increase in total area burned is not large compared to a short period. This indicates that large-scale wildfires have mainly occurred in a relatively short period (the last 5 years). A large body of research suggests that burning areas have increased within the last 5 years, and the top 20 largest California  Figure 2 focuses on the last wildfire severity in the last 5 years by geographic location in California. As shown, southern and coastal California utilities are more vulnerable to wildfires than those in northern or inland California.

Conclusions
Growing wildfire size and frequency in California concerns the water utilities and imposes strains on water utilities regarding increased costs for the existing water infrastructures' management and operations. Wildfires impact both post-wildfire water quality and water infrastructure management and operation. Accordingly, identifying the water utilities at a higher risk of wildfires is necessary to develop plans and strategies for effective water management and operation. Our study contributes to the literature by measuring the vulnerability of each utility to wildfires based on the exposure frequency and the extent of area burned by wildfires that occurred in each water utility service area. Our quantitative models take into account the nature of the censoring and selection biases on wildfire data and show an association between water utility characteristics and the level of vulnerability to wildfire risks. To summarize, we find that wildfire vulnerability is higher in government-owned utilities than in private ones, utilities primarily relying on surface water than groundwater, utilities using local-sourced water than purchased water, utilities located in Southern California than those in Northern California, utilities located in coastal California than those in inland California, and utilities serving a larger population. These findings provide useful information on water utilities that are likely to be more vulnerable to wildfires based on their characteristics, which may require more caution in their management and operation. After a wildfire, water utilities tend to experience increased infrastructural damages (e.g., water treatment, replacement or repair of the broken water system, water storage or distribution system) and operational challenges (e.g., emergency operation/response plans, disinfection, and boil water orders). The introduction of funding mechanisms to support water utilities with wildfire-vulnerable characteristics to respond fiscally and flexibly to wildfires. Such mechanisms might include insurance, grants, subsidies, and other state and federal aid programs; resultantly, water agencies may be able to recover some of the costs of water management and operation post-disaster assistance through the mechanisms. Given that adaption and mitigation of the risk of wildfire are inextricably linked to the provision of a safe water supply, it is essential to constitute partnerships across the diverse forest-and water entities at all levels: federal, state, local municipalities, communities, and nongovernment.