Background & Summary

The accumulation of carbon dioxide (CO2) in the atmosphere as a result of human activities, and the uptake of ~25% of anthropogenic CO2 by the ocean1,2, has led to increasing acidity of ocean waters of about −0.016 pH units per decade on a global scale since the 1980s3,4,5,6. This ocean acidification (OA) signal is measurable at time series sites7,8, observed in mapped data products of CO2 partial pressure6,9,10,11,12, captured by decadal repeat hydrographic cruises13,14, and simulated by ocean models15 and coupled Earth system models5,16,17. Superimposed on steady increases in accumulated anthropogenic carbon (Cant) and decreases in ocean pH, however, are various modes of temporal (e.g., diurnal, seasonal, interannual) and spatial (e.g., latitudinal, nearshore–offshore) variability, which are particularly pronounced in coastal ecosystems. This variability in coastal OA brings unique impacts to marine organisms that reside in coastal zones and are vulnerable to corrosive waters18.

Large marine ecosystems (LMEs) are ocean regions that border coastlines and are characterized by distinct bathymetry, hydrography, productivity, and trophic structure19. LMEs encompass estuaries and river mouths, nearshore coastal zones, continental shelves, and the outer margins of ocean current systems. Typically, the offshore boundary of an LME extends to the continental shelf break or to the seaward edge of a current system. Due to their coastal proximity, LMEs tend to be natural hotspots of variability in carbon cycling and rapid exchange between carbon pools. For example, intense surface primary productivity in the coastal ocean is fueled by nutrients from river input, atmospheric deposition, and coastal upwelling18; sinking organic matter from surface production leads to intense respiration throughout the water column and at the seafloor20; and high rates of sedimentation are observed in LMEs from both biogenic and lithogenic inputs21.

Ongoing anthropogenic climate drivers coupled with the natural processes occurring in coastal ecosystems make it challenging to attribute modes of OA variability to the appropriate driving mechanisms. For example, anthropogenic eutrophication from freshwater runoff and atmospheric pollution can augment natural nutrient inputs, leading to even greater net primary production in coastal surface waters and greater respiration in subsurface waters. Whereas the direct effect of CO2 uptake by primary producers mitigates OA at the surface, highly respired subsurface waters can be laterally transported and upwelled onto the continental shelf, leading to enhanced OA in the surface waters there18,22. These and other OA-modulating processes differ across ecosystems23, but their impacts are frequently correlated with environmental driver variables such as sea surface height, temperature, salinity, and chlorophyll-a concentration. These correlations allow OA metrics to be reconstructed from measurements and data products that are available at high spatial and temporal resolution11,24,25.

The data product described here is based on direct observations, which are used to reconstruct a recent history of surface ocean OA indicators at monthly, 0.25° resolution in U.S. LMEs. Observations are from a publicly available, annually updated database of surface CO2 observations: the Surface Ocean CO2 Atlas (SOCAT)26. SOCAT is an international data synthesis effort that has facilitated the production of global surface CO2 flux maps27 that contribute data-constrained estimates of the ocean CO2 sink in the Global Carbon Budget28. We also rely on publicly available satellite-derived surface ocean properties and data reanalysis products to leverage the predictive power of environmental variables for upscaling SOCAT observations across U.S. LMEs. This kind of spatiotemporal upscaling has historically been accomplished using statistical interpolations29,30, multiple linear regressions31, and machine learning approaches9,24,32,33. We build upon the approach of Sharp et al.24 — who presented a monthly surface ocean CO2 partial pressure (pCO2) mapped product for the California Current System region called RFR-CCS — to train random forest regression (RFR) algorithms to predict surface CO2 fugacity (fCO2) from environmental variables that can be derived with spatial and temporal continuity across U.S. LMEs.

We advance the Sharp et al.24 approach by first clustering each LME into sub-regions with similar environmental variability using Gaussian mixture modelling. In addition to fCO2, we predict surface total alkalinity (and nutrients) from empirical property estimation algorithms that have been validated and published34. We use fCO2 and total alkalinity (AT) to compute eight additional OA indicators — partial pressure of CO2, total dissolved inorganic carbon, pH on the total scale, hydrogen ion amount content, carbonate ion amount content, saturation states for aragonite and calcite, and the Revelle factor — to produce monthly data products over 1998–2022 on a 0.25° × 0.25° resolution grid. We refer to these data products as RFR-LMEs35, which are freely available online and will be updated annually. Throughout this paper, we will use the term “mapped data products” to describe RFR-LMEs; “mapping” refers to the reconstruction of OA indicators on monthly, spatially continuous grids via the two-step approach of clustering on regional variability and applying trained RFR algorithms to gridded predictor variables.

This work was partially motivated by a partnership with the National Oceanic and Atmospheric Administration (NOAA) Ecosystem Indicators Working Group (EIWG), who manage the National Marine Ecosystem Status (NaMES) website (https://ecowatch.noaa.gov). The NaMES website was created to provide an at-a-glance overview of conditions in U.S. LMEs. These conditions are presented as indicators, which are quantitative and/or qualitative measures of key components of the ecosystem and span the following categories: climatological (e.g., El Niño Southern Oscillation index), physical–chemical (e.g., sea surface temperature), biological (e.g., chlorophyll-a concentration), and human dimensions (e.g., coastal county population). Indicator datasets are used by many NOAA stakeholders, such as fisheries managers, to monitor their ecosystems of interest and to assess the potential for future changes. Indicators included on the NaMES website must be theoretically sound, have demonstrable importance to the system, be relevant and understandable, show sensitivity to environmental variability or policy actions, and complement other indicators that are already served. This paper will describe the theoretical basis of RFR-LMEs and their relevance their respective ecosystems, to justify the use of RFR-LMEs as NaMES indicators of ocean acidification. The NaMES requirements also state that the data used to develop ecosystem indicators should be publicly available, quantitative, directly measurable, and updated on a regular basis; they stipulate that data should have adequate spatial coverage and that the time-series duration should be greater than 10 years and expected to continue for the foreseeable future. Because RFR-LMEs fit these requirements, we aggregate three mapped OA indicators from RFR-LMEs into monthly and annual regional averages of those indicators (and their uncertainties). Timeseries of the selected OA indicators (pCO2, pH on the total scale, and aragonite saturation state) are available on the NaMES website and, like the RFR-LME mapped data products, will be updated annually.

Methods

An overview of the methodological procedure to create RFR-LMEs is provided in Fig. 1. First, data were obtained from a variety of sources and bin-averaged or interpolated onto a consistent grid. Then, within each LME, a two-step cluster–regression strategy was employed. In the first step, spatial clusters were created using Gaussian mixture models (GMMs) based on variability in environmental predictors. In the second step, random forest regression (RFR) algorithms were trained for each cluster using fCO2(SOCAT) as the target variable and co-located environmental variables as predictors. These algorithms were then applied to gridded (0.25° × 0.25°) monthly environmental predictor fields to create monthly RFR-LME mapped data products of sea surface CO2 fugacity (fCO2(RFR-LME)). Applying GMMs on surface data to first divide each LME into subregions reduces the burden on the RFRs to represent many different regimes of dynamic variability at once. Therefore, the RFR algorithms are able to reconstruct sea surface fCO2 more accurately than if all data points from the entire LME were included in the algorithm training36. To create RFR-LMEs for the other indicators, sea surface total alkalinity and nutrient values were estimated, and carbonate system calculations were performed. Uncertainties were propagated through these calculations to obtain uncertainty estimates for each RFR-LME. Finally, RFR-LMEs were evaluated against independent datasets.

Fig. 1
figure 1

Schematic of the procedure used to construct the ocean acidification indicator data products described by this study. Steps within each section on the right (A, B, and C) are labelled in the schematic on the left.

Data sources

Surface ocean fCO2 observations were downloaded from the Surface Ocean CO2 Atlas Version 2023 (SOCATv2023; https://doi.org/10.25921/r7xa-bt92)37 in a large quadrangle surrounding North America and U.S. Pacific Islands with the following coordinates: 18°S to 82°N, 140°E to 58°W (Fig. 2). These observations were filtered by year (1998–2022), dataset flag (A, B, C, or D), and quality flag (q.f. = 2, good data), and binned into 0.25 degrees latitude by 0.25 degrees longitude monthly grid cells using platform-weighted averages. A spatial resolution of 0.25° × 0.25° was chosen largely for coherence with the majority of available predictor datasets. Platform-weighted averages mean that, within each latitude by longitude by month bin, a platform-specific (e.g., ship-only, mooring-only) average was first calculated, then an average was taken of those averages (if more than one platform was represented within the cell). This was done to mitigate unwanted biases toward high-resolution measurement systems. For validation exercises, this binning process was also repeated with only moored buoy observations and with a dataset that excluded moored buoy observations.

Fig. 2
figure 2

Observations used to develop ocean acidification indicator data products in the eleven U.S. large marine ecosystems (LMEs) considered in this study. LME boundaries are displayed with (a) platform-weighted fCO2 from SOCATv2023, averaged over 1998–2022 in 0.25 degrees latitude by 0.25 degrees longitude grid cells; (b) platform-weighted fCO2 variability from SOCATv2023, calculated as the standard deviation over time in 0.25 degrees latitude by 0.25 degrees longitude grid cells; (c) the total number months over the timeseries with at least one observation in each grid cell; (d) months of the seasonal cycle with at least one observation in each grid cell; (e) sea surface temperature from OISSTv238; (f) sea surface salinity from the CMEMS GLORYS reanalysis product39; (g) wind speed from the ECMWF ERA5 reanalysis product43; and (h) bathymetry from the ETOPO 2022 Global Relief Model44. LME name abbreviations shown in panel (a) are provided in Table 1.

Binned observations were grouped into eleven LMEs defined according to the United States Exclusive Economic Zone (EEZ), in accordance with the practice of the NOAA EIWG (Table 1; Fig. 2). Platform-weighted fCO2 from SOCATv2023 observations (fCO2(SOCAT)) in each of these grid cells over time shows large-scale patterns of spatial variability (Fig. 2a) — such as relatively high fCO2(SOCAT) at the equator and relatively low fCO2(SOCAT) surrounding Alaska, compared to the region as a whole — and temporal variability (Fig. 2b) — such as relatively high standard deviation in fCO2(SOCAT) observations surrounding Alaska and near the coastlines of the continental U.S. compared to the relatively low standard deviation in these observations around the Pacific Islands, again compared to the region as a whole. The distribution of the total number of months sampled within each 0.25° × 0.25° grid cell and the number of months of the year sampled at least once across the full dataset (1998–2022) within each grid cell reveal consistent patterns (Fig. 2c,d). The Northeast U.S. has especially high observational coverage (9.5% of all 0.25° × 0.25° monthly grid cells covered and 63.6% of seasonal seasonally binned 0.25° × 0.25° grid cells covered); the Southeast U.S. (4.0% total, 50.0% seasonal), Gulf of Mexico (5.2% total, 53.1% seasonal), Caribbean Sea (6.6% total, 40.2% seasonal), and California Current System (3.0% total, 42.1% seasonal) have moderately high observational coverage (Table 1). Observational coverage generally decreases farther offshore.

Table 1 Summary information for the eleven U.S. large marine ecosystems (LMEs) considered in this study.

Next, gridded fields of satellite, reanalysis, and in situ observational products were downloaded from the sources detailed in Table 2. When applicable, these fields were re-gridded using standard interpolation functions to match the resolution and/or central grid cell positions of the binned fCO2(SOCAT) observations. In many cases, multiple datasets could be chosen, but preference were given to those that were provided at 0.25° resolution and that covered the relevant time and space. Rigorous comparison between different input datasets is planned for future development of RFR-LMEs as they are prepared for dynamic, operational production. Sea surface temperature (SST; Fig. 2e) and ice concentration were obtained from the NOAA Optimum Interpolation Sea Surface Temperature version 2 (OISSTv2) product at daily, 0.25° × 0.25° resolution38; values were averaged to monthly resolution. Sea surface salinity (SSS; Fig. 2f) and mixed layer depth (MLD) were obtained from the Copernicus Marine Environment Monitoring Service (CMEMS) Global Ocean Ensemble Physics Reanalysis (GLORYS) product at monthly, 0.25° resolution39. Sea surface height (SSH) was obtained from the CMEMS satellite gridded product, which is produced at monthly, 0.25° resolution by optimal interpolation of along-track measurements from available altimeter missions40. Sea surface chlorophyll (CHL) was obtained from the National Aeronautics and Space Administration (NASA) Ocean Colour Level-3 Mapped Chlorophyll Data product at monthly, 1/12° resolution and re-gridded to 0.25° resolution41,42. One-dimensional, linear interpolation was used within each grid cell to fill gaps in the chlorophyll dataset. Wind speed (Fig. 2g) was obtained from the fifth generation European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis for the global climate and weather (ERA5) at monthly, 0.25° resolution43. Bathymetry (Z, Fig. 2h) was obtained from the ETOPOv2022 Global Relief Model at 1/60° resolution and re-gridded to 0.25° resolution44. Sea level pressure (SLP) was obtained from the National Centers for Environmental Prediction/Department of Energy (NCEP/DOE) Reanalysis II model at monthly, 2.5° resolution and interpolated to 0.25° resolution45. Atmospheric pCO2 was obtained from the NOAA Marine Boundary Layer (MBL) product at weekly resolution and varying latitudinal resolution and was re-gridded to monthly, 0.25° resolution46. Binned observations of fCO2(SOCAT) were co-located in both time and space with the gridded predictors in preparation for algorithm training.

Table 2 Sources of gridded fields of satellite, reanalysis, and in situ observational products used to create RFR-LME maps.

As one of several validation exercises, the pCO2 reconstructions from our method were compared to other mapped pCO2 data products downloaded from SeaFlux (v2021.04)47, which is an ensemble of six surface pCO2 products that enables users to calculate air–sea CO2 flux consistently across the global ocean27. SeaFlux harmonizes six data-based pCO2 products: CMEMS-FFNN9,48, MPI-SOMFFN33,49, and NIES-FNN50, which are each constructed using neural networks; JENA-MLS30, which is constructed based on a mixed layer scheme; JMA-MLR10, which is constructed using multiple linear regressions; and CSIR-ML636, which is constructed using an ensemble of multiple machine-learning techniques. In an additional exercise, RFR-LME mapped data products were evaluated through comparisons to co-located, independently calculated OA indicators from research cruises included in the Global Ocean Data Analysis Project database (GLODAPv2.2022)51 and the Coastal Ocean Data Analysis Project – North America database (CODAP-NA)52. Measurements of SST, SSS, AT, CT, and nutrients were obtained from the GLODAP and CODAP-NA databases, then filtered to retain only observations with good quality flags for each of those variables that were collected at a depth of 10 meters or less.

Spatial clustering

Three clustering methods were tested: self-organizing mapping — a neural-network-based method of producing a low-dimensional representation of a set of input data — k-means clustering — an iterative method that optimizes a defined number of centroids by minimizing the in-cluster distances from the centroid for a multidimensional dataset — and Gaussian mixture modelling (GMM)53 — a method of clustering that assumes a multidimensional dataset is represented by a mixture of several Gaussian distributions with different properties. Of these methods, preliminary testing suggested GMM provided the best results in terms of k-fold cross-validated root-mean-square error (RMSE) in fCO2 (described in the following section) after RFRs were fit for each cluster. In addition, GMM clustering affords the benefit of providing probabilities that each spatiotemporal grid cell belongs within a given cluster instead of simply providing the cluster assignment to each grid cell as done in the other two clustering methods. These probabilities are used in our method to mitigate discontinuities at boundaries between clusters.

Variability (defined as the standard deviation within a spatial grid cell over time) in SLP, SST, and CHL were used as feature sets to form clusters in most LMEs; CHL was replaced with wind speed in two LMEs (BS and NBCS) due to insufficient CHL observations at high latitudes. The decisions to cluster based on variability over time instead of monthly values and to use the specified sets of variables were based on initial testing and optimization in terms of k-fold cross-validated RMSE in fCO2 (not shown). Future development of RFR-LMEs may continue to explore alternative clustering strategies.

GMM models with full, unshared covariance matrices were created using the MATLAB “fitgmdist” function. Full covariance matrices were used for GMM based on the a priori assumption that some of the predictor variables were correlated due to the nature of oceanographic environmental variables. Covariance matrices for GMM were unshared based on the a priori assumption that each spatial cluster had its own, different covariance matrix. The number of components (i.e., clusters; N in Table 1) was optimized, primarily by minimizing the k-fold cross-validated RMSE in fCO2, but also taking into account the Bayesian information criterion — a measure of model fit that includes a penalty for the number of clusters — and silhouette score — a measure of the accuracy of the clustering technique that is calculated by comparing each point’s similarity to the other points in its assigned cluster to how dissimilar it is to the points in the next nearest cluster (Fig. 3).

Fig. 3
figure 3

Example for Gaussian Mixture Model (GMM) optimization in the North Bering Chukchi Seas (NBCS). (a) Parameters used for GMM evaluation in the NBCS, plotted against the number of GMM clusters (N). The goal of the optimization procedure was to minimize the root mean squared error, maximize the global mean silhouette score, and identify the N at which the Bayesian information criterion was no longer sharply decreasing. N = 6 was ultimately selected for the NBCS region. (b) Distribution of GMM clusters for the NBCS and (c) the probability for grid cells belonging to cluster 1 (C1).

Machine learning regressions

Once the numbers of spatial clusters were determined for each LME, random forest regressions (RFRs)54 were trained for each cluster within each LME using binned fCO2(SOCAT) as a target variable and each of the co-located gridded variables listed in Table 2 along with longitude (degrees east with a 0° to 360° convention), latitude, distance from the coast, month of the year (sine- and cosine-transformed to maintain cyclicity throughout the year and predictability within each month), and year as predictors. These variables were found to be useful predictors of fCO2 by Sharp et al.24.

RFRs are a collection of regression “trees”, each of which is trained with a bootstrapped subset of the dataset. Each tree aims to generate a representation of the relationship between the predictor variables and the target variable for its bootstrapped subset of the data. This is done by splitting the data into a series of “branches” based on the predictors. At each branch point, only a random subset of the predictor variables is made available to the algorithm. The algorithm then optimally selects a predictor dataset and a specific value from that dataset on which to split the dataset into two additional branches/groups with the lowest possible within-group fCO2(SOCAT) variance. This continues until the branches become “leaves”, which means they are no longer split, either due to reaching a defined minimum leaf size or a certain criterion (e.g., variance of the remaining fCO2(SOCAT) observations). The use of an ensemble of regression trees constitutes the “forest” aspect of an RFR. The “randomness” aspect of the forest is due to the fact that each tree is constructed with different subsets of the full dataset and that different subsets of the predictors are available at each branch point, making it possible for each tree to provide a slightly different empirical regression for the dataset. New predictor data can be passed through each tree in the ensemble of a trained RFR, and an average of the values output from each tree is the fCO2 prediction (fCO2(RFR-LME)).

For each cluster, all grid cells with a GMM probability of greater than 10% for that cluster were used to train an RFR using the MATLAB “TreeBagger” function. This means that many grid cells on the geographic boundary between one or more clusters may then have been used to train multiple RFRs. The number of trees used for each RFR was set to 1000, which was confirmed to be sufficient through visual inspection of the out-of-bag RMSE with respect to the number of trees (not shown). The minimum leaf size was set to three based on k-fold cross-validation testing, and the number of predictors used for each decision split was set to 6 (equal to the total number of predictors divided by three and rounded up to the nearest whole number).

To create an RFR-LME map of fCO2 for each LME, all the gridded predictor variables (0.25° × 0.25°, monthly) within the LME were run through each cluster-specific RFR. This produced N fCO2(RFR-LME) maps for each LME, where N is equal to the number of clusters. These maps were then merged as weighted average fCO2(RFR-LME) maps using the GMM probabilities as weights, which helped to smooth out discontinuities between clusters. Lastly, RFR-LME maps of fCO2 were converted to maps of pCO2 (pCO2(RFR-LME)) using SST and SLP55.

Cross-validation was used to evaluate the skill of the fCO2(RFR-LME) estimates in each cluster and overall in each LME. This k-fold cross-validation was performed by sequentially withholding subsets of 20% of data, training versions of RFR algorithms with the remaining 80% of data, then, for each data point in the validation dataset, comparing the fCO2 obtained using the k-fold cross-validation algorithms (fCO2(RFR-LME-kFold)) to the observed fCO2(SOCAT) value. This procedure was repeated five times for each LME so all data points were included in the validation data once, producing ΔfCO2 values for each data point.

Alkalinity and nutrient estimation

Sea surface total alkalinity (AT), phosphate (PO4), and silicate (Si(OH)4) were estimated from gridded monthly fields of SSS and SST using Empirical Seawater Property Estimation Routines (ESPERs)34. ESPERs consist of both locally interpolated multiple linear regressions (ESPER-LIR) and feed-forward neural networks (ESPER-NN) trained to estimate seawater properties from a given set of input properties. Though ESPERs are global in nature, the regionally tuned ESPER-LIR coefficients and spatial coordinate predictors in ESPER-NNs mean that ESPERs function similarly to regional property estimation algorithms. ESPERs also provide the benefit of estimating uncertainty corresponding to each predicted value, allowing for the propagation of those uncertainties through downstream computations. The ESPER-Mixed routine (an average of both the ESPER-LIR and ESPER-NN approaches) was used for this study, due to assessment statistics that have indicated a lower global RMSE for the ESPER-Mixed approach (e.g., a global average RMSE of 3.7 μmol kg−1 for AT) compared to ESPER-LIR (4.0 μmol kg−1) and ESPER-NN (4.1 μmol kg−1) when producing property estimates from SSS and SST34.

Carbonate system calculations

CO2 system calculations were performed using CO2SYSv3 for MATLAB56 to determine additional ocean acidification (OA) indicators: dissolved inorganic carbon (CT(RFR-LME)), pH on the total scale (pHT(RFR-LME)), total hydrogen ion amount content ([H+]T(RFR-LME)), total carbonate ion amount content ([CO32−]T(RFR-LME)), saturation states for aragonite (Ωar(RFR-LME)) and calcite (Ωca(RFR-LME)), and Revelle factor (RF(RFR-LME)). These calculations were performed using well established thermodynamic equations describing the chemistry of carbon dioxide in seawater57,58. Input parameters to these equations were fCO2(RFR-LME), along with ESPER-estimated AT (AT(ESPER)), phosphate (PO4(ESPER)), and silicate (Si(OH)4(ESPER)). Carbonic acid dissociation constants from Lueker et al.59, the boric acid dissociation constant from Dickson60, the total boron to salinity ratio from Lee et al.61, the dissociation constant of water from Dickson62, and the hydrofluoric acid dissociation constant from Perez and Fraga63 were used in CO2 system calculations. Uncertainties were propagated through these calculations (see following section).

Uncertainty estimation

Uncertainties in RFR-LME maps of fCO2 were evaluated based on the previously described k-fold cross-validation approach. First, spatially gridded absolute values of ΔfCO2 from k-fold cross-validation were low-pass filtered (using 0.5° × 0.5° windows) two times in each LME to begin to fill nearby grid cells with uncertainty values. Then, nearest-neighbor interpolation was used to fill any remaining empty grid cells with data-based, spatially scaled uncertainty values (\({{\rm E}}_{{fCO}2(s)}\)). This approach only assesses the strength of the fit for available. It is therefore prudent to assign greater uncertainties for periods and regions where training data are less abundant or absent. For this reason, the \({{\rm E}}_{{fCO}2(s)}\) values were further scaled over time by calculating two scaling factors specific to each LME, one representing the seasonal data coverage (using 3-month running means of the relative data coverage across the seasonal cycle) and another representing the relative annual data coverage (using 5-year running means of the relative data coverage across the timeseries).

The seasonal scaling factor (\({{\rm{\varepsilon }}}_{{seas}.}\)) was calculated as:

$${{\rm{\varepsilon }}}_{seas.}=\frac{({\sum }_{my=m{y}_{ref}-1}^{m{y}_{ref}+1}{n}_{obs(my)}/{n}_{tot(my)})/MY}{{n}_{obs(my)}/{n}_{tot(my)}}$$

where my is the numbered month of the year (1–12), myref is the reference month of the year for each time step (1–12), nobs(my) is the number of grid cells with observations in the corresponding month of the year, ntot(my) is the total number of available grid cells in the corresponding month of the year, and MY is the total number of months considered within the window for each time step. Because January (1) comes after December (12), myref − 1 = 12 when myref = 1 and myref + 1 = 1 when myref = 12. The long-term scaling factor (\({{\rm{\varepsilon }}}_{{ann}.}\)) was calculated as:

$${{\rm{\varepsilon }}}_{ann.}=\frac{({\sum }_{ms=m{s}_{ref}-24}^{m{s}_{ref}+24}{n}_{obs(ms)}/{n}_{tot(ms)})/MS}{{n}_{obs(ms)}/{n}_{tot(ms)}}$$

where ms is the numbered month in the full series (1–228), msref is the reference month in the series for each time step (1–228), nobs(ms) is the number of grid cells with observations in the corresponding month of the series, ntot(ms) is the total number of available grid cells in the corresponding month, and MS is the total number of months considered within the window for each time step. Fewer months were considered within each window near the beginning and end of the time series. Finally, the estimated uncertainty of fCO2(RFR-LME) scaled spatially and temporally (i.e., seasonally and annually) was calculated as:

$${{\rm E}}_{{fCO}2(s,t)}={{\rm E}}_{{fCO}2(s)}\times {{\rm{\varepsilon }}}_{{seas}.}\times {{\rm{\varepsilon }}}_{{ann}.}.$$

The window sizes of the scalers were selected to balance data coverage in each time window with realistic periods of time over which observational data may exhibit serial correlations.

Uncertainties in ESPER-estimated AT and nutrients were provided by the ESPER algorithms, which estimate uncertainty using a polynomial fit to salinity and depth. The ESPER algorithms are less skillful in the surface ocean where we use them than they are globally across all depths, and the uncertainty estimates are correspondingly greater at shallow depths.

The uncertainty estimates were propagated along with standard estimated total uncertainties in carbonate system constants (see Table 1 in Orr et al.64) to calculate uncertainty in mapped OA indicators. Gaussian uncertainty propagation was employed, using CO2SYSv3 for MATLAB56, which is based on uncertainty propagation code introduced in CO2SYSv2 by Orr et al.64.

Validation and evaluation

The skill of the RFR-LME maps was evaluated through comparisons with co-located OA indicators independently calculated from the ship-based GLODAPv2.2022 and CODAP-NA measurements described above. OA indicators were computed at in situ temperature from the AT and CT observations using CO2SYSv3 for MATLAB56 and the same equilibrium constants as before. Although the GLODAPv2.2022 and CODAP-NA databases also include pHT and pCO2 measurements, they are not as widespread as AT and CT measurements, so we chose to calculate all indicators from AT and CT for evaluation. Each observation was then co-located with the corresponding RFR-LME grid cell and compared.

In addition, RFR-LME maps were compared to global mapped data products of sea surface pCO2 obtained from SeaFlux (v2021.04)47. Long-term averages of pCO2 from RFR-LME maps and SeaFlux maps were computed across the overlapping time periods of both products (i.e., 1998–2019). Mapped differences between RFR-LME and each SeaFlux ensemble member, as well as an average across the ensemble, were computed and compared.

Finally, observations of pCO2 at fixed buoy locations were compared to pCO2 from RFR-LME data products at grid cells corresponding to those moored buoy observations. For this exercise, special-case RFR-LME maps were created by training RFRs on gridded fCO2(SOCAT) data with buoy observations excluded, then using those algorithms to construct the maps. Comparing pCO2 mapped from datasets both with and without the underlying buoy observations allowed for evaluation of the influence that those seasonally resolved observations have on the fidelity of the pCO2 reconstruction. pCO2 values extracted from the mapped SeaFlux datasets were also included in this comparison, allowing for separate evaluation of how the LME-scale, 0.25° × 0.25° monthly reconstructions compare to global 1° × 1° monthly reconstructions.

Data Records

RFR-LME maps can be accessed through the NOAA National Centers for Environmental Information (NCEI) via the Ocean Carbon and Acidification Data System (OCADS; https://doi.org/10.25921/h8vw-e872)35. The dataset is available in NetCDF format on 0.25° × 0.25° spatial grids at monthly timesteps. Each mapped OA indicator and its uncertainty is provided via a separate NetCDF file, along with a reference grid that indicates to which LME each spatial grid cell belongs. Additionally, regional timeseries for CO2 partial pressure, calcium carbonate saturation state, and pH are displayed at the NOAA Marine Ecosystem Status website (https://ecowatch.noaa.gov). Average values, trends, seasonal amplitudes, and uncertainty estimates of ocean acidification indicators from RFR-LMEs vary considerably among the regions (Tables 36).

Table 3 Long-term mean values for OA indicators in each LME.
Table 4 Long-term trends and uncertainties for OA indicators in each LME.
Table 5 Seasonal amplitudes for OA indicators in each LME.
Table 6 Average uncertainties for OA indicators in each LME.

Long-term means (Table 3; Fig. 4) allow for the description of LME-scale patterns in surface ocean carbonate chemistry. Tropical LMEs (PI and CS) are characterized most notably by high carbonate ion parameters ([CO32−]T, Ωar, and Ωca) and low RF values. Within this pair, the CS can be described as more acidified (higher pCO2 and lower pHT) but better buffered (lower RF and higher AT/CT ratio). Subtropical Atlantic LMEs (GM and SE) also have high carbonate ion parameters ([CO32−]T, Ωar, and Ωca) and low RF values. Compared to the Tropical LMEs however, Subtropical Atlantic LMEs have higher CT and AT values, although AT/CT ratios and therefore RFs are similar between the two groups. Temperate and subarctic coastal LMEs (CCS, GA, and NE) can generally be considered intermediate in all parameters: pCO2, pH, carbonate ion parameters, CT, AT, and RF. Within the group, the GA has the highest RF and lowest carbonate ion parameters, the NE has the lowest RF and highest carbonate ion parameters, and the CCS is between the two. Subarctic North Pacific LMEs (AI and EBS) are characterized by high CT, pCO2, and RF; and low pHT and carbonate ion parameters. Arctic LMEs (NBCS and BS) are characterized by high pHT and RF; and low AT, pCO2, and carbonate ion parameters.

Fig. 4
figure 4

Long-term means of RFR-LME mapped OA indicators. Mapped averages of (a) pCO2(RFR-LME), (b) CT(RFR-LME), (c) pHT(RFR-LME), (d) [H+]T(RFR-LME), (e) Ωar(RFR-LME), (f) Ωca(RFR-LME), (g) [CO32−]T(RFR-LME), and (h) RF(RFR-LME) over the timeseries (1998–2022) within each LME.

Spatial variability in OA indicators is evident within each LME and throughout the seasonal cycle (Fig. 5). For example, the CCS develops a strong dipole in the summer (June/July/August; Fig. 5c), with low CT off the coast in the northern CT and high CT off the coast in the central CCS. This dipole becomes much weaker in the winter (December/January/February; Fig. 5a). Similarly, relatively low CT occurs off the coast in the northern NE region in the summer but disappears in the winter (Fig. 5c). The southern continental Alaskan coastline exhibits low CT, especially nearshore in the summer, whereas the northern Alaskan coastline is relatively higher in CT than nearby offshore waters in the Arctic Ocean. A band of relatively low CT is evident from about 10° to 20° N in the PI region, between higher CT in the equatorial Pacific and North Pacific subtropical gyre, a feature that has appeared in other sea surface CT data products6.

Fig. 5
figure 5

Seasonal means of CT(RFR-LME). Mapped averages of CT(RFR-LME), in the northern hemisphere (a) winter (DJF), (b) spring (MAM), (c) summer (JJA), and (d) fall (SON) over the timeseries (1998–2022) within each LME.

Mapped indicator uncertainties (see Fig. 6) are served alongside RFR-LME maps35, providing a resource for evaluating uncertainty in OA indicator values at a given location. Area-weighted mean u[pCO2(RFR-LME)] was 12.0 μatm across the entire domain, u[pHT(RFR-LME)] was 0.015, and uar(RFR-LME)] was 0.18. These domain-wide means are influenced by the large area and low uncertainties in the Pacific Islands region; individual LME uncertainties, particularly in Artic and Subarctic LMEs, may be considerably larger. Spatial patterns of uncertainties also differ for different OA indicators. For example, uar(RFR-LME)] tends to be relatively high in the tropical LMEs (Fig. 6d), where Ωar(RFR-LME) is also high (Fig. 4e); on the other hand, u[pCO2(RFR-LME)] is extremely low in tropical the LMEs (Fig. 6b).

Fig. 6
figure 6

k-fold cross-validated differences between estimated and measured fCO2 and scaled uncertainty in fCO2(RFR-LME), AT(ESPER), and Ωar(RFR-LME). (a) Absolute differences (|ΔfCO2|) between fCO2(RFR-LME-kFold) and fCO2(SOCAT) calculated via a k-fold cross-validation approach compared to (b) long-term average uncertainty in fCO2(RFR-LME-kFold), calculated and scaled according to the procedure in the Uncertainty estimation section. Long-term average uncertainty in (c) AT(ESPER) and (d) Ωar(RFR-LME) are also shown.

Uncertainty values reflect not only uncertainty in the RFR predictions, but also uncertainty introduced by interpolating over spatial and temporal gaps in observational coverage. Average uncertainty values for each LME are presented alongside OA indicator timeseries on the NOAA NaMES website. Importantly, the uncertainty values provided in Table 6 and on the NaMES website represent weighted means of grid-cell-level uncertainties rather than uncertainties corresponding to region-wide averages, which may or may not be smaller due to cancelling errors that are removed by areal averaging or larger due to inadequacies of our spatiotemporal scaling approach for representing uncertainties in under-sampled times and locations.

Technical Validation

Data-based validation

A k-fold cross-validation approach was used to assess the skill of the fCO2 estimates and subsequent OA indicator calculations. Region-wide error statistics for each of the eleven LMEs (before the spatial and temporal scaling) indicate that fCO2(RFR-LME-kFold) values are centered around (mean and median errors all close to zero) and tend to correlate closely with (nine of the eleven R2 values are 0.8 or greater) the measured values of fCO2(SOCAT) (Table 7). Root mean square errors (RMSEs) are generally about three times larger than median absolute errors, indicating error populations with long tails of a few particularly large errors. When viewed spatially (Fig. 6a), absolute differences (|ΔfCO2|) between fCO2(RFR-LME-kFold) and fCO2(SOCAT) are greatest near the coast and in the North Pacific and Arctic, and smallest in the open ocean and in the tropics and subtropics. High |ΔfCO2| values tend to correlate with areas of high background variability in fCO2(SOCAT) (Fig. 2b), emphasizing that the RFR algorithms may struggle to capture extreme values, which is consistent with the aforementioned long-tailed error populations.

Table 7 Error statistics of fCO2 predicted by k-fold cross-validation algorithms (fCO2(RFR-LME-kFold)) compared to fCO2 from SOCAT observations (fCO2(SOCAT)).

Comparison to global trends

RFR-LME indicator timeseries (1998–2022) represent spatially weighted annual averages of OA indicators computed from RFR-LME maps. Increasing pCO2(RFR-LME) and decreasing pHT(RFR-LME) are observed in each LME (Figs. 7, 8) — trends that are strongly influenced by anthropogenic CO2 uptake and amplified by ocean warming (Table 8). Ωar(RFR-LME) decreases in many (but not all) LMEs over 1998–2022 (Fig. 9), as Ωar decline is driven by anthropogenic CO2 uptake as well, but moderated by ocean warming and also influenced by changes in SSS (Table 8). Trends in OA indicators across U.S. LMEs (Table 4) can be compared with global trends of about + 1.5 μatm yr–1 for pCO2 (+0.3 to +1.8 μatm yr–1 for RFR-LMEs), +0.9 μmol kg–1 yr–1 for CT (–0.2 to +1.0 μmol kg–1 yr–1 for RFR-LMEs), –1.7·10–3 units yr–1 for pHT (–1.8·10–3 to –0.2·10–3 yr–1 for RFR-LMEs), and –7.0·10–3 yr–1 for Ωar1,4,6,10 (–7.3·10–3 to +2.6·10–3 yr–1 for RFR-LMEs).

Fig. 7
figure 7

pCO2 timeseries for the eleven U.S. LMEs considered in this study. Area-weighted annual (red) and monthly (black) means are presented along with envelopes of uncertainty. Uncertainties are calculated by scaling k-fold cross-validated uncertainties spatially with a two-dimensional low-pass filter, then temporally according to long-term data coverage (5-year running windows) and seasonal data coverage (3-month running windows). Average values across the timeseries are indicated by dotted lines. Note the different y-axis for each LME. For many LMEs, uncertainties are larger near the beginning of the timeseries, when SOCAT observations are less dense. Inset within each timeseries plot is a figure showing the percent of the LME represented across the seasonal cycle by grid cells that remain ice-free across the entire timeseries; these are the grid cells used to compute monthly and annual means.

Fig. 8
figure 8

pHT timeseries for the eleven U.S. LMEs considered in this study. Same as Fig. 7, except for pHT.

Table 8 Long-term mean values, trends, and seasonal amplitudes for temperature and salinity in each LME.
Fig. 9
figure 9

Ωar timeseries for the eleven U.S. LMEs considered in this study. Same as Fig. 7, except for Ωar.

It is important to note that, for some of the Arctic and subarctic LMEs that are characterized by high seasonal ice coverage, these trends are driven by primarily summertime OA indicator values (see inset plots in Figs. 79). This limitation, along with the fact that these timeseries are relatively short (25 years) and regionally limited, can explain divergence in some specific cases from the global trends.

Comparison to discrete shipboard data

The RFR-LME fields presented in this work are constructed using surface CO2 measurements from shipboard flow-through analyzers. This automated observational approach allows for the collection of high spatial and temporal resolution observations of surface ocean carbonate chemistry. Discrete bottle measurements of carbonate chemistry parameters represent another approach for monitoring ocean acidification. The discrete approach allows for high-quality observations throughout the water column. Here we take near-surface discrete bottle measurements of AT and CT from GLODAPv2.202251 and CODAP-NA52, use those measurements to calculate OA indicators, and compare those calculated values with mapped surface OA indicators from RFR-LMEs.

RFR-LME indicator values are generally in good agreement with calculations from discrete bottle measurements (Fig. 10). Compared to the k-fold-validation-based uncertainty estimates (Table 6), a greater spread (i.e. larger IQRs) in the differences between GLODAP/CODAP and RFR-LME values is expected in this exercise for two reasons. First, uncertainty stemming from CO2 system calculations will contribute to the spread (e.g., Orr et al.64), since GLODAP/CODAP indicators values are calculated from AT and CT and RFR-LME indicator values are calculated from fCO2 and AT. As an example, average propagated uncertainties for GLODAP/CODAP calculations using standard measurement errors for AT and CT (2 μmol kg−1 for both) and for equilibrium constants64 were calculated as 12.0 μatm for pCO2, 0.014 for pHT, and 0.11 for Ωar. In addition, the two datasets are fundamentally different in their spatiotemporal resolution. RFR-LME grid cells represent averages for large swaths of the surface ocean over a monthly timestep, whereas shipboard measurements are appropriate for a distinct point in space at a distinct time. This spatiotemporal mismatch is especially noteworthy in the coastal ocean where diurnal and other sub-monthly modes of variability operate over spatial scales much finer that 0.25 degrees of latitude or longitude. The calculations from bottle measurements also tend to indicate higher pCO2 and therefore lower pHT and Ωar. These offsets between the two datasets may be partly related to inconsistencies in carbonate chemistry calculations, whereby calculations from AT and CT at most surface conditions tend to produce lower pHT (and higher pCO2) values than corresponding measurements of those properties65,66.

Fig. 10
figure 10

Comparisons between OA indicators retrieved from RFR-LME maps and those calculated from discrete observations. (a,c,e) Histograms showing differences between calculations of (a) pCO2, (c) pHT, and (e) Ωar from discrete surface (depth ≤ 10 m) observations of AT and CT and values of the same OA indicators from RFR-LME maps. Discrete observations that fall within the boundaries of LMEs were obtained from the GLODAPv2.202251 and CODAP-NA52 data products. Error statistics shown represent the median errors and the interquartile ranges of errors for each comparison. (b,d,f) Mapped differences between calculations of (b) pCO2, (d) pHT, and (f) Ωar from discrete surface (depth ≤ 10 m) observations of AT and CT and values of the same OA indicators from RFR-LME maps. Discrete differences are binned into 1° × 1° grid cells for this map.

Comparison to moored buoy time series data

Timeseries of pCO2 from fixed grid cells of RFR-LME maps and RFR-LME maps constructed without moored buoy observations (RFR-LME-NM) were compared to pCO2 observations at fixed buoy locations that were extracted from the SOCATv2023 database and aggregated in monthly bins. This provides a test of the capacity of RFR-LMEs to reproduce monthly variability in validation measurements that were withheld from training, and can be considered an assessment of the RFR-LME skill with monthly variability generally. Mapped global data products of surface ocean carbonate chemistry obtained from SeaFlux27,47 were also compared to the moored buoy observations.

Differences between moored buoy observations and mapped products (Table 9; Fig. 11) suggest that timeseries extracted from our regionally focused RFR-LME maps more meaningfully reflect observed pCO2 than those from mapped global products. Like RFR-LME, most of these alternative products were trained from versions of SOCAT that include the buoy observations. The average median ( ± IQR) ΔpCO2 (pCO2(moor.)pCO2(grid)) was 0.1 ± 23.6 μatm for RFR-LME and increased to −13.0 ± 49.2 μatm for the RFR-LME-NM product, which excluded these observations from the training data. These increased error statistics emphasize the value of moored buoy observations for the surface CO2 observing system. Still, all but one (JENA-MLS; ΔpCO2 = −2.2 ± 48.7) of the mapped data products from SeaFlux exhibited more variability in their differences from buoy observations than even the version of RFR-LME constructed without moored buoy observations. JENA-MLS may perform better at representing pCO2 at these mooring sites because it explicitly models mixed layer fluxes and processes rather than relying on empirical relationships learned from large sets of data.

Table 9 Medians and interquartile ranges (μatm) of comparisons between moored buoy observations and corresponding grid cells from mapped monthly sea surface pCO2 data products.
Fig. 11
figure 11

Summarized differences between monthly binned moored buoy pCO2 observations and mapped pCO2 data products. (a) Medians and (b) interquartile ranges of differences, (c) differences in seasonal amplitudes, and (d) correlations of residual values (after removing the trend and seasonal cycle) between binned moored buoy pCO2 observations and mapped pCO2 data products. Each of these statistics are shown as boxplots representative of all 14 mooring sites compared to each mapped product, where the boxes extend from the 25th to 75th percentile, the center line shows the median of the data, the whiskers extend to the most extreme data points not considered outliers, and dots denote outliers (arrows denote where outliers do not appear within the axis limits). RFR-LME is the product described in this work and RFR-LME-NM is the constructed using the same method but without moored buoy observations. References for the SeaFlux mapped products are provided in the Data sources section.

Individual timeseries from moored buoy sites (Fig. 12) emphasize the significant seasonal and interannual variability in buoy pCO2 observations (black dots), even when aggregated in monthly bins, and the challenge for mapped products (colored lines) to accurately capture each of those variations at a local scale. The performance of the regional RFR-LME maps compared to the global mapped products reinforces the notion that locally specific relationships captured by training machine learning algorithms at the scale of objectively defined clusters within LMEs can resolve fine-scale variations in ocean biogeochemistry more effectively than global-scale algorithms, even though those global-scale algorithms are trained with a larger amount of data24,67. Positive trends in pCO2 superimposed upon seasonal variations are visible in both the moored buoy observations and mapped data product timeseries (Fig. 12).

Fig. 12
figure 12

Comparisons between pCO2 from selected moored buoy observations, RFR-LME and RFR-LME-NM maps, and other mapped surface products. (a) Mapped long-term mean pCO2(RFR-LME) along with mean pCO2 from moored buoy observations (shaded dots). Colors of the arrows in the map correspond to the colors of the outlines of timeseries plots from grid cells that match the buoy locations. (bi) Each timeseries shows buoy observations aggregated into monthly bins (black dots), corresponding timeseries from RFR-LME maps (red solid lines) and RFR-LME-NM maps (blue dashed lines), and corresponding timeseries from mapped global data products included in SeaFlux47 (thin colored lines).

Comparison to mapped data products

Finally, RFR-LME surface pCO2 was compared directly to the six global-scale mapped products of pCO2 from SeaFlux across the overlapping interval between them (1998–2019). Maps of average surface pCO2 display similar patterns across all six SeaFlux products, but differences between those products and RFR-LME (ΔpCO2 = pCO2(SeaFlux)pCO2(RFR-LME)) reveal subtle regional differences (Fig. 13). SeaFlux provides a pCO2 filler field derived from Landschützer et al.68 to fill spatial gaps in global surface products; this gap filler is not used to produce the difference maps displayed in Fig. 13. However, for spatial consistency, it is used to calculate the averages and standard deviations of the differences for each data product shown in Fig. 13.

Fig. 13
figure 13

Comparisons between mapped surface pCO2 products and RFR-LME maps. Long-term mean pCO2 for each of the SeaFlux47 mapped global products (a,c,e,g,i,k) and differences between those products and RFR-LME (ΔpCO2 = pCO2(SeaFlux)pCO2(RFR-LME); b,d,f,h,j,m) are shown. Area-weighted averages and standard deviations of ΔpCO2 are provided above each set of two figures.

In the tropical Pacific, RFR-LME maps agreed well with all products but NIES-FNN, where a prevailing negative bias is evident in that product. In the Atlantic, RFR-LME maps generally agreed well, with visible biases in the Mississippi plume (CSIR-ML6), Georges Bank (JMA-MLR), Caribbean (JENA-MLS), and throughout (NIES-FNN). Coastal negative biases are visible for most products in the central CCS region, and coastal positive biases are visible in the northern CCS region. Both positive and negative biases occur in the regions surrounding Alaska, where low observational density likely leads to significant diversity in pCO2 estimates among the gap-filling approaches.

Despite these regional discrepancies with some individual products, the median (±1 IQR) ΔpCO2 for the ensemble average of all six SeaFlux products is 0.8 ± 16.6 μatm. This indicates that RFR-LME — which represents local-scale temporal variability in surface pCO2 more effectively than global products (Table 9; Fig. 11) — agrees at broad scales with observation-based products that are well accepted and widely used by community-wide synthesis efforts such at the Global Carbon Budget2 and REgional Carbon Cycle Assessment and Processes Project (RECCAP2)69.