The following describes the available data sets and how each is used to construct the TDS model. Resistivity, porosity, temperature, and bicarbonate concentration data are used to make TDS predictions at discrete locations in the volume of analysis. Those values are interpolated throughout the entire volume using kriging. The TDS model is parameterized using mathematical optimization with the sum of squared residuals as the objective function to be minimized. Note that a similar method using borehole measurements for salinity estimation has been termed the resistivity-porosity (RP) method in some antecedent literature (Lyle 1988; Peterson 1991; Lindner-Lunsford and Bruce 1995; Schnoebelen et al. 1995; Hamlin and Rocha 2015; Gillespie et al. 2017).
Data
This work relies on the following six datasets, to be discussed in turn:
-
1.
TDS measurements from produced water samples from 64 oil wells
-
2.
TDS measurements from groundwater samples from 152 water wells
-
3.
Borehole logs for formation resistivity from 40 oil wells
-
4.
Porosity model inputs from 10 oil wells
-
5.
Temperature model inputs from 37 oil wells
-
6.
Bicarbonate model inputs from 27 oil wells
Figure S1 of the electronic supplementary material (ESM) shows the date distributions of datasets 1 and 3.
Dataset 1: produced water TDS measurements
Geochemical measurements of produced water samples from Fruitvale and Rosedale Ranch oil fields are used as the ground truth for TDS when setting the first-order parameters of the model (Fig. 1). The data come from a compilation of produced water chemical analyses (DOGGR 2017a), a Division of Oil, Gas, and Geothermal Resources (DOGGR) database for underground injection control (UIC) wells that includes data for the TDS of water for selected wells and zones, and the US Geological Survey National Produced Water Database (Gillespie et al. 2017; Metzger et al. 2018; USGS 2017). These data sources list data collected by well operators.
This dataset consists of 64 data points, each with x and y (denoting well location), z (elevation relative to sea level, as calculated from the depth of the top perforation of the well), and a TDS value. The data are available from Metzger et al. (2018). Counts by oil field and stratigraphic unit are shown in Table 1.
Table 1 Number of TDS measurements by oil field and stratigraphic unit Dataset 2: water well TDS measurements
Historical geochemical analyses of samples from water wells in the area compiled by Metzger et al. (2018) provide an additional 152 TDS values (Fig. 1). Water wells are shallow compared to oil and gas wells and are typically used for irrigation or as domestic or public water supply. Bottom perforations range from 114 m (375 ft) above sea level to 112 m (367 ft) below sea level. These observations are used only to visualize TDS at shallow depths where the geophysical logs do not provide coverage. They are not used in the setting of model parameters because they are out of the coverage area of geophysical logs.
Dataset 3: borehole logs for formation resistivity
Borehole logs (also known as geophysical well logs or well logs) were obtained from the DOGGR website (DOGGR 2017b) as raster images, which were digitized to convert into numerical values with a commercial software program from Neuralog (Neuralog 2018). Most well logs prior to the 1980s lack the data needed to estimate formation porosity, but older logs such as these comprise the primary dataset due to their availability and spatial distribution in the study area (Fig. 1). These logs have readings for spontaneous potential (SP) and formation resistivity (Rt) with depth. Formation resistivity readings are essentially a depth-continuous measurement but, for this dataset, a discrete number of measurements, at depths that correspond to clay-free (“clean”) sands were chosen to minimize the impact of electrical charges present in clay minerals and associated bound water on the results. The clean sand zones were found by analyzing the SP curve and the deep and shallow resistivity curves (for details see p. 77, Asquith and Krygowski 2004). Measurements were discarded in the oil-bearing zone, which was inferred from the perforation interval, core analyses, mudlogs, and from driller’s notes when available. The presence of oil and gas causes resistivity to increase so that it does not accurately represent the resistivity of the formation water. This allows for the exclusion of the effects of clay and hydrocarbons and for the assumption that all resistivity measurements represent only rock and water.
From 40 wells (27 from Fruitvale and 13 from Rosedale Ranch), 364 data points are derived (Fig. S2 of the ESM), each with x and y (denoting well location), z (denoting elevation relative to sea level), and Rt (formation resistivity). The data are available from Stephens et al. (2018).
Dataset 4: porosity model inputs
TDS is inferred from Rt using Archie’s Equation (Archie 1942). This equation requires a value for the formation porosity, which for these purposes is defined as the relative volume of water in pore spaces, excluding clay-bound water or crystallization water (Ellis and Singer 2007). This quantity can be inferred by combining readings of a gamma-gamma density log and a neutron-porosity log (Asquith and Krygowski 2004). Since these logs are not available for most wells in the primary dataset, a porosity model is constructed for each oil field using logs from 10 wells with porosity logs, 5 for Fruitvale and 5 for Rosedale Ranch. This dataset consists of the continuous logs for each well (Stephens et al. 2018). The construction of the porosity model is discussed in the following section ‘Porosity model’.
Dataset 5: temperature model inputs
Like porosity logs, temperature logs are not plentiful in the study area; however, the maximum temperature within the borehole is generally recorded on well log headers. In zones not affected by thermal enhanced oil recovery operations, which are not used in the study area, the maximum temperature is assumed to be at the bottom of the well and can be used to calculate a temperature gradient within the oil field. This dataset consists of bottom hole temperatures and associated depths from 16 wells in Fruitvale and 21 wells in Rosedale Ranch (Stephens et al. 2018).
Dataset 6: bicarbonate model inputs
Log analysis procedures for determining TDS from resistivity assume the water is a Na-Cl type water. While this is the case for much of the study area, some zones within the Fruitvale field contain Na-HCO3 type water. Because the electrical properties of chloride and bicarbonate are different, the model must account for elevated concentrations of bicarbonate where it occurs (Alger 1966; Chart Gen-4 in Schlumberger 2009, p. 5). Measurements of HCO3− concentrations from 27 oil and gas wells (Gans et al. 2018) are used to construct a bicarbonate model that is used within the TDS model to predict TDS within the Fruitvale field.
There are additional bicarbonate data from water wells in the study area, which were not used in the bicarbonate model. As with measurements of TDS from water wells (dataset 2) the bicarbonate data from water wells are out of the coverage area of the geophysical logs.
Porosity model
The porosity model gives predictions for sand bed porosity by depth in each of the two oil fields in the study area. It is constructed in four steps, all of which are conventional in well log interpretation (Asquith and Krygowski 2004; Ellis and Singer 2007).
First, the density log reading is converted into a “density porosity” by assuming the density of the rock matrix is known, via:
$$ {\phi}_{\mathrm{D}}=\frac{\rho_{\mathrm{ma}}-{\rho}_{\mathrm{b}}}{\rho_{\mathrm{ma}}-{\rho}_{\mathrm{fl}}} $$
(1)
where the terms are as defined:
-
ϕ
D
:
-
Density-derived porosity (dimensionless)
-
ρ
ma
:
-
Rock matrix density, assumed to be 2.65 g/cm3
-
ρ
b
:
-
Measured formation density (g/cm3)
-
ρ
fl
:
-
Fluid density, assumed to be 1 g/cm3
Secondly, density porosity and neutron porosity readings are compared to identify, for each well, the depths at which the sand is most likely to be clean (clay free). This is accomplished by only considering porosity measurements where the neutron and density curves are within 2% of each other, thus yielding 936 “sand points” for Fruitvale and 1267 for Rosedale Ranch.
Thirdly, density porosity (ϕD) and neutron porosity (ϕN) are combined via the root mean square formula from Asquith and Krygowski (2004) to obtain a composite estimate for porosity (ϕN − D) at each sand point, through:
$$ {\phi}_{\mathrm{N}-\mathrm{D}}=\sqrt{\frac{\phi_{\mathrm{N}}^2+{\phi}_{\mathrm{D}}^2}{2}} $$
(2)
Finally, lines are fitted to plots of ϕN − D versus depth for each oil field to obtain the porosity model (Fig. 3).
Temperature model
The temperatures and corresponding depths were fit with a linear regression to create a temperature model for each oil field (Fig. 3). The equation of the best fit line was used to calculate a temperature at every depth for each of the wells in the analysis (Gillespie et al. 2017).
Bicarbonate model
The Fruitvale field has elevated levels of bicarbonate in produced water (Fig. S3 of the ESM). This is likely due to meteoric recharge brought into Fruitvale by the Kern River, which is also the reason Rosedale Ranch does not have high concentrations of bicarbonate. The plot of the ratio [HCO3−]/TDS against the logarithm of TDS was fit with a sigmoid curve to model the fact that bicarbonate predominates when TDS is low, and tapers in significance as TDS increases (Fig. 4). The sigmoid’s lower asymptote is set to zero, and the upper asymptote is set to 0.73, which is the value of [HCO3−]/TDS in a sodium bicarbonate solution. The sigmoid function was selected to model the bicarbonate fraction in TDS because the behavior at the lower and upper ranges of TDS could be controlled by setting the asymptotes. As noted already, the fraction of bicarbonate in a solution cannot exceed 0.73, and as TDS increases, chloride becomes the predominant anion and the bicarbonate fraction becomes insignificant (demonstrated by data from Kharaka and Hanor 2004), and this behavior could not be modeled with other functions (i.e. linear, polynomial, etc.).
The bicarbonate model is used within the TDS model to derive TDS estimations in zones where bicarbonate concentrations are significantly contributing to resistivity responses read from the well logs. TDS would be underestimated in these zones without this correction. The application of the bicarbonate model is discussed further in the following section.
TDS model
The TDS model takes resistivity readings at sand points as input, and incorporates outputs from the porosity, bicarbonate, and temperature models to derive a mean and variance for groundwater TDS at all points in the volume of analysis. It is constructed in multiple steps. To summarize:
-
Archie’s law is used to estimate the resistivity of groundwater from formation resistivity, with input from the porosity model.
-
Groundwater resistivity is converted to TDS values, with input from the temperature model and the bicarbonate model.
-
TDS values are interpolated to the whole volume by kriging.
Using Archie’s law to find groundwater resistivity
Archie (1942) found empirically that in brine-saturated (hydrocarbon-free) sand beds, bulk resistivity and brine resistivity are related as follows:
$$ {R}_0=F\times {R}_{\mathrm{w}} $$
(3)
where the terms are as defined:
-
R
0
:
-
Resistivity of 100% water-saturated rock (ohm-m)
-
F
:
-
Formation factor
-
R
w
:
-
Resistivity of the water (ohm-m)
The formation factor is related to the porosity of the rock by,
$$ F=\frac{a}{\phi^m} $$
(4)
where:
-
a
:
-
Tortuosity factor
-
ϕ
:
-
Porosity (dimensionless)
-
m
:
-
Cementation factor
The parameters a and m vary by rock type and location. If they are known, and if porosity is known (or in this case, obtained from a porosity model), brine resistivity can be estimated by:
$$ {R}_{\mathrm{w}}={R}_0\left(\frac{\phi^m}{a}\right) $$
(5)
Ideally a and m are determined by lab analysis of borehole cores, which are not available in the study area. Instead, a and m for each oil field are determined from the optimized solution of the TDS model to fit laboratory TDS values of produced water. This is discussed in section ‘TDS model parameterization’.
From groundwater resistivity to TDS
Deriving TDS from the resistivity of a brine solution is a three-step process. Firstly, calculate the resistivity that the brine would have at 75 °F (Asquith and Krygowski 2004):
$$ {R}_{\mathrm{w}75}={R}_{\mathrm{w}}\times \frac{T+6.77}{75+6.77} $$
(6)
where T = temperature (degrees Fahrenheit).
Next, calculate the TDS (ppm) of a NaCl solution with this resistivity (Bateman and Konen 1978):
$$ {\mathrm{TDS}}_{\mathrm{NaCl}}={10}^{\left[\left(3.562-{\log}_{10}\left({R}_{\mathrm{w}75}-0.0123\right)\right)/0.955\right]} $$
(7)
Lastly, if needed, derive TDS from TDSNaCl in zones where sodium and chloride are not the predominant ions. Bicarbonate and chloride ions have similar electrical mobility, but the greater molar mass of bicarbonate implies higher TDS for solutions with bicarbonate than for a pure NaCl solution, if resistivity is kept constant (Alger 1966). A conversion chart in conjunction with the bicarbonate model is used to derive TDS from TDSNaCl.
To approximate the Schlumberger chart (Chart Gen-4 in Schlumberger 2009, p. 5), one can use the equation
$$ {\mathrm{TDS}}_{\mathrm{NaCl}}=\left[\mathrm{Na}\right]+\left[\mathrm{Cl}\right]+0.345\bullet \left[{{\mathrm{HCO}}_3}^{-}\right] $$
(8)
where TDSNaCl is the concentration of a NaCl solution with the same resistivity as a brine that contains bicarbonate. Elsewhere a function f(TDS) that gives [HCO3−]/TDS for TDS in the domain of interest was constructed (Fig. 4). Combining the two equations gives
$$ {\mathrm{TDS}}_{\mathrm{NaCl}}=\mathrm{TDS}\bullet \left[1-f\left(\mathrm{TDS}\right)\right]+0.345\bullet \mathrm{TDS}\bullet f\left(\mathrm{TDS}\right) $$
(9)
This simplifies to
$$ {\mathrm{TDS}}_{\mathrm{NaCl}}=\mathrm{TDS}\bullet \left[1-0.655\bullet f\left(\mathrm{TDS}\right)\right] $$
(10)
Since the target is TDS in terms of TDSNaCl, the mathematical inverse of this relationship is desired, but a closed-form mathematical expression does not exist. Thus, fixed-point iteration is used to solve for TDS. This equation can be rearranged to get:
$$ \mathrm{TDS}=\frac{{\mathrm{TDS}}_{\mathrm{NaCl}}}{1-0.655\bullet f\left(\mathrm{TDS}\right)} $$
(11)
An approximation for TDS can be found by substituting for TDSNaCl for TDS on the right-hand side:
$$ {\mathrm{TDS}}_{\mathrm{approx}}=\frac{{\mathrm{TDS}}_{\mathrm{NaCl}}}{1-0.655\bullet f\left({\mathrm{TDS}}_{\mathrm{NaCl}}\right)} $$
(12)
This approximation can be used to get a better approximation by resubstituting it for TDS:
$$ {\mathrm{TDS}}_{\mathrm{approx}2}=\frac{{\mathrm{TDS}}_{\mathrm{NaCl}}}{1-0.655\bullet f\left({\mathrm{TDS}}_{\mathrm{approx}}\right)} $$
(13)
This process can be repeated until convergence. In practice, five iterations suffice for a good approximation. Note when [HCO3−] is zero, TDSNaCl equals TDS. Applying this methodology to derive TDS from TDSNaCl to account for bicarbonate improves the TDS model prediction error (discussed further in section ‘TDS model parameterization’).
Kriging TDS
The TDS values derived from borehole sand points are spatially discrete (Fig. S2 of the ESM). Three-dimensional ordinary kriging was used to interpolate log TDS (the logarithm of TDS), to obtain a mean and variance for this quantity at all points in the volume of analysis. The initial motivation for transforming TDS was to have an interpretation for negative interpolated values. Subsequently, examination of the orthonormal residuals (Kitanidis 1991), showed that log TDS comports with modeling assumptions much better than untransformed TDS, as is discussed in the following.
Kriging requires the analyst to choose a model for how observations (log TDS values) covary as a function of distance. An initial inspection of the measured TDS values helped to determine two aspects of the covariance structure. Firstly, it became apparent that the coordinate system could be made isotropic by scaling z (depth) up by a factor of 10 (Kitanidis 1997; Olea 2012). When measured TDS values are projected onto the A–A′ plane that crosscut both fields (Fig. 1), the TDS trends about as much from top to bottom as from side to side, and this plane has a height-to-width ratio of on the order of 10. Secondly, the apparent TDS trend suggests that log TDS lacks spatial stationarity in the mean, or at least that the volume of analysis was too small to warrant positing a stationary mean. Thus, a linear variogram model was used for log TDS (Kitanidis 1997), which has just two parameters, nugget (y-intercept) and slope. These parameters were set by fitting the experimental variogram (Fig. S4 of the ESM).
Variogram fitting and kriging predictions are performed using the Python software package PyKrige (Murphy 2018). PyKrige was also used to analyze krige residuals (Isaaks and Srivastava 1989; Kitanidis 1991, 1997), to verify the modeling assumption that the values being kriged obey a multivariate normal distribution. In this analysis, data points (x, y, z, log TDS) are put one at a time into the kriging model, but before each data point is put in, the model is used to predict log TDS at that location. The difference (the residual) is scaled by the square root of the prediction variance. If modeling assumptions are correct, the scaled residuals should have a unit normal distribution. By comparing the first four moments of this empirical distribution with that of a unit normal, it can be seen that taking the logarithm of TDS before kriging was correct in this case (Table 2).
Table 2 Scaled residual empirical moments, compared to unit normal moments TDS model parameterization
As already described, the TDS model consists of four steps: Archie’s Equation, resistivity-to-salinity conversion, accounting for bicarbonate in Fruitvale, and kriging. Only the first part has parameters that cannot be derived from model inputs or set by general geological/geophysical knowledge. Resistivity-to-salinity conversion (the second part) has no parameters. The bicarbonate model (the third part) has two parameters that are used to fit a sigmoidal function to the bicarbonate and TDS data, but they are found by fitting the data with nonlinear least squares. Kriging (the fourth part) has three parameters: a z-scale for anisotropy, the nugget, and the slope of the variogram, but the z-scale was set by inspecting measured TDS values, and the nugget and variogram slope was set by computing the experimental variogram, and therefore kriging parameters are fully determined by model inputs. Archie’s Equation parameters (a and m) are the only parameters of the TDS model that remain free.
When borehole core analyses are unavailable, a common approach to setting a and m is to use values from a previous study where the rocks seem to be similar in description (Winsauer et al. 1952; Carothers 1968; Porter and Carothers 1970; Carothers and Porter 1971). The Humble parameterization (a = 0.62 and m = 2.15), established by Winsauer et al. (1952), is recommended for unconsolidated sands. Other settings often used are the Archie parameterization (a = 1.0, m = 2.0) and the Tixier parameterization (a = 0.81, m = 2.0), which are recommended for consolidated sands (Archie 1942; Lyle 1988; Asquith and Krygowski 2004).
To see how such an approach would fare, the model was run with the Humble parameterization, and made predictions for each of the 64 points of dataset 1. Figure 5a shows a cross-plot of predicted versus observed TDS values. The model tends to underestimate TDS, with a root-mean-square error (RMSE) of 0.42, but not in all parts of the study area. The predictions are fairly accurate in the in the Fruitvale field, but too low in Rosedale Ranch. A higher value for a would have caused predicted TDS values to be higher overall. The model was run with the Archie parameterization (a = 1.0, m = 2.0) and the results are shown in Fig. 5b. The RMSE dropped to 0.37 and the model now slightly overpredicts in the Fruitvale field.
These preliminary analyses suggest that a and m vary within the study area, and that a good model might have separate a and m parameters in each oil field. As for setting these parameters, since it is not known how the lithology of any zone compares to lithologies from previous studies, a practical approach would be to set the parameters by trying to match predicted TDS values with measured produced water TDS values. What needs to be decided, then, is how many a and m parameters there should be, and how to partition the volume of analysis. The zones should be numerous enough to account for any significant variation, but not so many that the calibration data is spread too thin. Two zones were selected by distinguishing between the two oil fields, so that each field is a zone. Each field was assigned a separate a and m.
Mathematical optimization is used to find settings for the four model parameters (a and m in each field). The sum of squared residuals (between predicted TDS and measured produced water TDS values) is the objective function to be minimized. This function incorporates the full TDS model (with Archie’s Equation, resistivity to TDS conversion, the bicarbonate correction, and kriging) and compares model outputs against produced water TDS values. Optimization results are shown in Table 3. With this parameterization, RMSE drops to 0.23, and there is no longer systematic over- or under-prediction of TDS (Fig. 5c).
Table 3 Results of TDS model parameter optimization The geological justification for this approach is that the environment of deposition and/or sediment sources affects physical rock characteristics, which in turn determines a and m. For example, depositional environment can affect the shale volumes within sand beds, and greater shale content results in lower formation resistivity, which can be modeled by lower a values (Worthington 1993). Partitioning the volume of analysis allows the model to account for not only variable clay content, but also subtle differences in pore geometry and rock cementation that are represented by Archie parameters.
Cross-validation results show the estimated typical relative prediction accuracy for the area of analysis as a whole was found to be 22% (see section S1 in the ESM). Also, as noted before, applying the bicarbonate model within the TDS model improves performance—without the bicarbonate model the RSME is 0.30 compared to 0.23 with the bicarbonate model.
Because the input data set for the TDS model was collected over four decades, one must consider if there is a temporal bias in the data—for example, if groundwater TDS changes through time, older resistivity readings may not reflect later conditions. However, TDS model errors are not correlated with the dates the water samples were collected (Pearson r = −0.13, see Fig S1c in the ESM). This suggests there is no significant temporal bias in the data (see section S2 in the ESM for further discussion).