A mapped dataset of surface ocean acidification indicators in large marine ecosystems of the United States

Sharp, Jonathan D.; Jiang, Li-Qing; Carter, Brendan R.; Lavin, Paige D.; Yoo, Hyelim; Cross, Scott L.

doi:10.1038/s41597-024-03530-7

A mapped dataset of surface ocean acidification indicators in large marine ecosystems of the United States

Data Descriptor
Open access
Published: 02 July 2024

Volume 11, article number 715, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

A mapped dataset of surface ocean acidification indicators in large marine ecosystems of the United States

Download PDF

238 Accesses
10 Altmetric
1 Mention
Explore all metrics

Abstract

Mapped monthly data products of surface ocean acidification indicators from 1998 to 2022 on a 0.25° by 0.25° spatial grid have been developed for eleven U.S. large marine ecosystems (LMEs). The data products were constructed using observations from the Surface Ocean CO₂ Atlas, co-located surface ocean properties, and two types of machine learning algorithms: Gaussian mixture models to organize LMEs into clusters of similar environmental variability and random forest regressions (RFRs) that were trained and applied within each cluster to spatiotemporally interpolate the observational data. The data products, called RFR-LMEs, have been averaged into regional timeseries to summarize the status of ocean acidification in U.S. coastal waters, showing a domain-wide carbon dioxide partial pressure increase of 1.4 ± 0.4 μatm yr⁻¹ and pH decrease of 0.0014 ± 0.0004 yr⁻¹. RFR-LMEs have been evaluated via comparisons to discrete shipboard data, fixed timeseries, and other mapped surface ocean carbon chemistry data products. Regionally averaged timeseries of RFR-LME indicators are provided online through the NOAA National Marine Ecosystem Status web portal.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Background & Summary

The accumulation of carbon dioxide (CO₂) in the atmosphere as a result of human activities, and the uptake of ~25% of anthropogenic CO₂ by the ocean^1,2, has led to increasing acidity of ocean waters of about −0.016 pH units per decade on a global scale since the 1980s^3,4,5,6. This ocean acidification (OA) signal is measurable at time series sites^7,8, observed in mapped data products of CO₂ partial pressure^6,9,10,11,12, captured by decadal repeat hydrographic cruises^13,14, and simulated by ocean models¹⁵ and coupled Earth system models^5,16,17. Superimposed on steady increases in accumulated anthropogenic carbon (C_ant) and decreases in ocean pH, however, are various modes of temporal (e.g., diurnal, seasonal, interannual) and spatial (e.g., latitudinal, nearshore–offshore) variability, which are particularly pronounced in coastal ecosystems. This variability in coastal OA brings unique impacts to marine organisms that reside in coastal zones and are vulnerable to corrosive waters¹⁸.

Large marine ecosystems (LMEs) are ocean regions that border coastlines and are characterized by distinct bathymetry, hydrography, productivity, and trophic structure¹⁹. LMEs encompass estuaries and river mouths, nearshore coastal zones, continental shelves, and the outer margins of ocean current systems. Typically, the offshore boundary of an LME extends to the continental shelf break or to the seaward edge of a current system. Due to their coastal proximity, LMEs tend to be natural hotspots of variability in carbon cycling and rapid exchange between carbon pools. For example, intense surface primary productivity in the coastal ocean is fueled by nutrients from river input, atmospheric deposition, and coastal upwelling¹⁸; sinking organic matter from surface production leads to intense respiration throughout the water column and at the seafloor²⁰; and high rates of sedimentation are observed in LMEs from both biogenic and lithogenic inputs²¹.

Ongoing anthropogenic climate drivers coupled with the natural processes occurring in coastal ecosystems make it challenging to attribute modes of OA variability to the appropriate driving mechanisms. For example, anthropogenic eutrophication from freshwater runoff and atmospheric pollution can augment natural nutrient inputs, leading to even greater net primary production in coastal surface waters and greater respiration in subsurface waters. Whereas the direct effect of CO₂ uptake by primary producers mitigates OA at the surface, highly respired subsurface waters can be laterally transported and upwelled onto the continental shelf, leading to enhanced OA in the surface waters there^18,22. These and other OA-modulating processes differ across ecosystems²³, but their impacts are frequently correlated with environmental driver variables such as sea surface height, temperature, salinity, and chlorophyll-a concentration. These correlations allow OA metrics to be reconstructed from measurements and data products that are available at high spatial and temporal resolution^11,24,25.

The data product described here is based on direct observations, which are used to reconstruct a recent history of surface ocean OA indicators at monthly, 0.25° resolution in U.S. LMEs. Observations are from a publicly available, annually updated database of surface CO₂ observations: the Surface Ocean CO₂ Atlas (SOCAT)²⁶. SOCAT is an international data synthesis effort that has facilitated the production of global surface CO₂ flux maps²⁷ that contribute data-constrained estimates of the ocean CO₂ sink in the Global Carbon Budget²⁸. We also rely on publicly available satellite-derived surface ocean properties and data reanalysis products to leverage the predictive power of environmental variables for upscaling SOCAT observations across U.S. LMEs. This kind of spatiotemporal upscaling has historically been accomplished using statistical interpolations^29,30, multiple linear regressions³¹, and machine learning approaches^9,24,32,33. We build upon the approach of Sharp et al.²⁴ — who presented a monthly surface ocean CO₂ partial pressure (pCO₂) mapped product for the California Current System region called RFR-CCS — to train random forest regression (RFR) algorithms to predict surface CO₂ fugacity (fCO₂) from environmental variables that can be derived with spatial and temporal continuity across U.S. LMEs.

We advance the Sharp et al.²⁴ approach by first clustering each LME into sub-regions with similar environmental variability using Gaussian mixture modelling. In addition to fCO₂, we predict surface total alkalinity (and nutrients) from empirical property estimation algorithms that have been validated and published³⁴. We use fCO₂ and total alkalinity (A_T) to compute eight additional OA indicators — partial pressure of CO₂, total dissolved inorganic carbon, pH on the total scale, hydrogen ion amount content, carbonate ion amount content, saturation states for aragonite and calcite, and the Revelle factor — to produce monthly data products over 1998–2022 on a 0.25° × 0.25° resolution grid. We refer to these data products as RFR-LMEs³⁵, which are freely available online and will be updated annually. Throughout this paper, we will use the term “mapped data products” to describe RFR-LMEs; “mapping” refers to the reconstruction of OA indicators on monthly, spatially continuous grids via the two-step approach of clustering on regional variability and applying trained RFR algorithms to gridded predictor variables.

This work was partially motivated by a partnership with the National Oceanic and Atmospheric Administration (NOAA) Ecosystem Indicators Working Group (EIWG), who manage the National Marine Ecosystem Status (NaMES) website (https://ecowatch.noaa.gov). The NaMES website was created to provide an at-a-glance overview of conditions in U.S. LMEs. These conditions are presented as indicators, which are quantitative and/or qualitative measures of key components of the ecosystem and span the following categories: climatological (e.g., El Niño Southern Oscillation index), physical–chemical (e.g., sea surface temperature), biological (e.g., chlorophyll-a concentration), and human dimensions (e.g., coastal county population). Indicator datasets are used by many NOAA stakeholders, such as fisheries managers, to monitor their ecosystems of interest and to assess the potential for future changes. Indicators included on the NaMES website must be theoretically sound, have demonstrable importance to the system, be relevant and understandable, show sensitivity to environmental variability or policy actions, and complement other indicators that are already served. This paper will describe the theoretical basis of RFR-LMEs and their relevance their respective ecosystems, to justify the use of RFR-LMEs as NaMES indicators of ocean acidification. The NaMES requirements also state that the data used to develop ecosystem indicators should be publicly available, quantitative, directly measurable, and updated on a regular basis; they stipulate that data should have adequate spatial coverage and that the time-series duration should be greater than 10 years and expected to continue for the foreseeable future. Because RFR-LMEs fit these requirements, we aggregate three mapped OA indicators from RFR-LMEs into monthly and annual regional averages of those indicators (and their uncertainties). Timeseries of the selected OA indicators (pCO₂, pH on the total scale, and aragonite saturation state) are available on the NaMES website and, like the RFR-LME mapped data products, will be updated annually.

Methods

An overview of the methodological procedure to create RFR-LMEs is provided in Fig. 1. First, data were obtained from a variety of sources and bin-averaged or interpolated onto a consistent grid. Then, within each LME, a two-step cluster–regression strategy was employed. In the first step, spatial clusters were created using Gaussian mixture models (GMMs) based on variability in environmental predictors. In the second step, random forest regression (RFR) algorithms were trained for each cluster using fCO_2(SOCAT) as the target variable and co-located environmental variables as predictors. These algorithms were then applied to gridded (0.25° × 0.25°) monthly environmental predictor fields to create monthly RFR-LME mapped data products of sea surface CO₂ fugacity (fCO_2(RFR-LME)). Applying GMMs on surface data to first divide each LME into subregions reduces the burden on the RFRs to represent many different regimes of dynamic variability at once. Therefore, the RFR algorithms are able to reconstruct sea surface fCO₂ more accurately than if all data points from the entire LME were included in the algorithm training³⁶. To create RFR-LMEs for the other indicators, sea surface total alkalinity and nutrient values were estimated, and carbonate system calculations were performed. Uncertainties were propagated through these calculations to obtain uncertainty estimates for each RFR-LME. Finally, RFR-LMEs were evaluated against independent datasets.

Data sources

Surface ocean fCO₂ observations were downloaded from the Surface Ocean CO₂ Atlas Version 2023 (SOCATv2023; https://doi.org/10.25921/r7xa-bt92)³⁷ in a large quadrangle surrounding North America and U.S. Pacific Islands with the following coordinates: 18°S to 82°N, 140°E to 58°W (Fig. 2). These observations were filtered by year (1998–2022), dataset flag (A, B, C, or D), and quality flag (q.f. = 2, good data), and binned into 0.25 degrees latitude by 0.25 degrees longitude monthly grid cells using platform-weighted averages. A spatial resolution of 0.25° × 0.25° was chosen largely for coherence with the majority of available predictor datasets. Platform-weighted averages mean that, within each latitude by longitude by month bin, a platform-specific (e.g., ship-only, mooring-only) average was first calculated, then an average was taken of those averages (if more than one platform was represented within the cell). This was done to mitigate unwanted biases toward high-resolution measurement systems. For validation exercises, this binning process was also repeated with only moored buoy observations and with a dataset that excluded moored buoy observations.

Binned observations were grouped into eleven LMEs defined according to the United States Exclusive Economic Zone (EEZ), in accordance with the practice of the NOAA EIWG (Table 1; Fig. 2). Platform-weighted fCO₂ from SOCATv2023 observations (fCO_2(SOCAT)) in each of these grid cells over time shows large-scale patterns of spatial variability (Fig. 2a) — such as relatively high fCO_2(SOCAT) at the equator and relatively low fCO_2(SOCAT) surrounding Alaska, compared to the region as a whole — and temporal variability (Fig. 2b) — such as relatively high standard deviation in fCO_2(SOCAT) observations surrounding Alaska and near the coastlines of the continental U.S. compared to the relatively low standard deviation in these observations around the Pacific Islands, again compared to the region as a whole. The distribution of the total number of months sampled within each 0.25° × 0.25° grid cell and the number of months of the year sampled at least once across the full dataset (1998–2022) within each grid cell reveal consistent patterns (Fig. 2c,d). The Northeast U.S. has especially high observational coverage (9.5% of all 0.25° × 0.25° monthly grid cells covered and 63.6% of seasonal seasonally binned 0.25° × 0.25° grid cells covered); the Southeast U.S. (4.0% total, 50.0% seasonal), Gulf of Mexico (5.2% total, 53.1% seasonal), Caribbean Sea (6.6% total, 40.2% seasonal), and California Current System (3.0% total, 42.1% seasonal) have moderately high observational coverage (Table 1). Observational coverage generally decreases farther offshore.

Table 1 Summary information for the eleven U.S. large marine ecosystems (LMEs) considered in this study.

Full size table

Next, gridded fields of satellite, reanalysis, and in situ observational products were downloaded from the sources detailed in Table 2. When applicable, these fields were re-gridded using standard interpolation functions to match the resolution and/or central grid cell positions of the binned fCO_2(SOCAT) observations. In many cases, multiple datasets could be chosen, but preference were given to those that were provided at 0.25° resolution and that covered the relevant time and space. Rigorous comparison between different input datasets is planned for future development of RFR-LMEs as they are prepared for dynamic, operational production. Sea surface temperature (SST; Fig. 2e) and ice concentration were obtained from the NOAA Optimum Interpolation Sea Surface Temperature version 2 (OISSTv2) product at daily, 0.25° × 0.25° resolution³⁸; values were averaged to monthly resolution. Sea surface salinity (SSS; Fig. 2f) and mixed layer depth (MLD) were obtained from the Copernicus Marine Environment Monitoring Service (CMEMS) Global Ocean Ensemble Physics Reanalysis (GLORYS) product at monthly, 0.25° resolution³⁹. Sea surface height (SSH) was obtained from the CMEMS satellite gridded product, which is produced at monthly, 0.25° resolution by optimal interpolation of along-track measurements from available altimeter missions⁴⁰. Sea surface chlorophyll (CHL) was obtained from the National Aeronautics and Space Administration (NASA) Ocean Colour Level-3 Mapped Chlorophyll Data product at monthly, 1/12° resolution and re-gridded to 0.25° resolution^41,42. One-dimensional, linear interpolation was used within each grid cell to fill gaps in the chlorophyll dataset. Wind speed (Fig. 2g) was obtained from the fifth generation European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis for the global climate and weather (ERA5) at monthly, 0.25° resolution⁴³. Bathymetry (Z, Fig. 2h) was obtained from the ETOPOv2022 Global Relief Model at 1/60° resolution and re-gridded to 0.25° resolution⁴⁴. Sea level pressure (SLP) was obtained from the National Centers for Environmental Prediction/Department of Energy (NCEP/DOE) Reanalysis II model at monthly, 2.5° resolution and interpolated to 0.25° resolution⁴⁵. Atmospheric pCO₂ was obtained from the NOAA Marine Boundary Layer (MBL) product at weekly resolution and varying latitudinal resolution and was re-gridded to monthly, 0.25° resolution⁴⁶. Binned observations of fCO_2(SOCAT) were co-located in both time and space with the gridded predictors in preparation for algorithm training.

Table 2 Sources of gridded fields of satellite, reanalysis, and in situ observational products used to create RFR-LME maps.

Full size table

As one of several validation exercises, the pCO₂ reconstructions from our method were compared to other mapped pCO₂ data products downloaded from SeaFlux (v2021.04)⁴⁷, which is an ensemble of six surface pCO₂ products that enables users to calculate air–sea CO₂ flux consistently across the global ocean²⁷. SeaFlux harmonizes six data-based pCO₂ products: CMEMS-FFNN^9,48, MPI-SOMFFN^33,49, and NIES-FNN⁵⁰, which are each constructed using neural networks; JENA-MLS³⁰, which is constructed based on a mixed layer scheme; JMA-MLR¹⁰, which is constructed using multiple linear regressions; and CSIR-ML6³⁶, which is constructed using an ensemble of multiple machine-learning techniques. In an additional exercise, RFR-LME mapped data products were evaluated through comparisons to co-located, independently calculated OA indicators from research cruises included in the Global Ocean Data Analysis Project database (GLODAPv2.2022)⁵¹ and the Coastal Ocean Data Analysis Project – North America database (CODAP-NA)⁵². Measurements of SST, SSS, A_T, C_T, and nutrients were obtained from the GLODAP and CODAP-NA databases, then filtered to retain only observations with good quality flags for each of those variables that were collected at a depth of 10 meters or less.

Spatial clustering

Three clustering methods were tested: self-organizing mapping — a neural-network-based method of producing a low-dimensional representation of a set of input data — k-means clustering — an iterative method that optimizes a defined number of centroids by minimizing the in-cluster distances from the centroid for a multidimensional dataset — and Gaussian mixture modelling (GMM)⁵³ — a method of clustering that assumes a multidimensional dataset is represented by a mixture of several Gaussian distributions with different properties. Of these methods, preliminary testing suggested GMM provided the best results in terms of k-fold cross-validated root-mean-square error (RMSE) in fCO₂ (described in the following section) after RFRs were fit for each cluster. In addition, GMM clustering affords the benefit of providing probabilities that each spatiotemporal grid cell belongs within a given cluster instead of simply providing the cluster assignment to each grid cell as done in the other two clustering methods. These probabilities are used in our method to mitigate discontinuities at boundaries between clusters.

Variability (defined as the standard deviation within a spatial grid cell over time) in SLP, SST, and CHL were used as feature sets to form clusters in most LMEs; CHL was replaced with wind speed in two LMEs (BS and NBCS) due to insufficient CHL observations at high latitudes. The decisions to cluster based on variability over time instead of monthly values and to use the specified sets of variables were based on initial testing and optimization in terms of k-fold cross-validated RMSE in fCO₂ (not shown). Future development of RFR-LMEs may continue to explore alternative clustering strategies.

GMM models with full, unshared covariance matrices were created using the MATLAB “fitgmdist” function. Full covariance matrices were used for GMM based on the a priori assumption that some of the predictor variables were correlated due to the nature of oceanographic environmental variables. Covariance matrices for GMM were unshared based on the a priori assumption that each spatial cluster had its own, different covariance matrix. The number of components (i.e., clusters; N in Table 1) was optimized, primarily by minimizing the k-fold cross-validated RMSE in fCO₂, but also taking into account the Bayesian information criterion — a measure of model fit that includes a penalty for the number of clusters — and silhouette score — a measure of the accuracy of the clustering technique that is calculated by comparing each point’s similarity to the other points in its assigned cluster to how dissimilar it is to the points in the next nearest cluster (Fig. 3).

Machine learning regressions

Once the numbers of spatial clusters were determined for each LME, random forest regressions (RFRs)⁵⁴ were trained for each cluster within each LME using binned fCO_2(SOCAT) as a target variable and each of the co-located gridded variables listed in Table 2 along with longitude (degrees east with a 0° to 360° convention), latitude, distance from the coast, month of the year (sine- and cosine-transformed to maintain cyclicity throughout the year and predictability within each month), and year as predictors. These variables were found to be useful predictors of fCO₂ by Sharp et al.²⁴.

RFRs are a collection of regression “trees”, each of which is trained with a bootstrapped subset of the dataset. Each tree aims to generate a representation of the relationship between the predictor variables and the target variable for its bootstrapped subset of the data. This is done by splitting the data into a series of “branches” based on the predictors. At each branch point, only a random subset of the predictor variables is made available to the algorithm. The algorithm then optimally selects a predictor dataset and a specific value from that dataset on which to split the dataset into two additional branches/groups with the lowest possible within-group fCO_2(SOCAT) variance. This continues until the branches become “leaves”, which means they are no longer split, either due to reaching a defined minimum leaf size or a certain criterion (e.g., variance of the remaining fCO_2(SOCAT) observations). The use of an ensemble of regression trees constitutes the “forest” aspect of an RFR. The “randomness” aspect of the forest is due to the fact that each tree is constructed with different subsets of the full dataset and that different subsets of the predictors are available at each branch point, making it possible for each tree to provide a slightly different empirical regression for the dataset. New predictor data can be passed through each tree in the ensemble of a trained RFR, and an average of the values output from each tree is the fCO₂ prediction (fCO_2(RFR-LME)).

For each cluster, all grid cells with a GMM probability of greater than 10% for that cluster were used to train an RFR using the MATLAB “TreeBagger” function. This means that many grid cells on the geographic boundary between one or more clusters may then have been used to train multiple RFRs. The number of trees used for each RFR was set to 1000, which was confirmed to be sufficient through visual inspection of the out-of-bag RMSE with respect to the number of trees (not shown). The minimum leaf size was set to three based on k-fold cross-validation testing, and the number of predictors used for each decision split was set to 6 (equal to the total number of predictors divided by three and rounded up to the nearest whole number).

To create an RFR-LME map of fCO₂ for each LME, all the gridded predictor variables (0.25° × 0.25°, monthly) within the LME were run through each cluster-specific RFR. This produced N fCO_2(RFR-LME) maps for each LME, where N is equal to the number of clusters. These maps were then merged as weighted average fCO_2(RFR-LME) maps using the GMM probabilities as weights, which helped to smooth out discontinuities between clusters. Lastly, RFR-LME maps of fCO₂ were converted to maps of pCO₂ (pCO_2(RFR-LME)) using SST and SLP⁵⁵.

Cross-validation was used to evaluate the skill of the fCO_2(RFR-LME) estimates in each cluster and overall in each LME. This k-fold cross-validation was performed by sequentially withholding subsets of 20% of data, training versions of RFR algorithms with the remaining 80% of data, then, for each data point in the validation dataset, comparing the fCO₂ obtained using the k-fold cross-validation algorithms (fCO_{2(RFR-LME-kFold)}) to the observed fCO_2(SOCAT) value. This procedure was repeated five times for each LME so all data points were included in the validation data once, producing ΔfCO₂ values for each data point.

Alkalinity and nutrient estimation

Sea surface total alkalinity (A_T), phosphate (PO₄), and silicate (Si(OH)₄) were estimated from gridded monthly fields of SSS and SST using Empirical Seawater Property Estimation Routines (ESPERs)³⁴. ESPERs consist of both locally interpolated multiple linear regressions (ESPER-LIR) and feed-forward neural networks (ESPER-NN) trained to estimate seawater properties from a given set of input properties. Though ESPERs are global in nature, the regionally tuned ESPER-LIR coefficients and spatial coordinate predictors in ESPER-NNs mean that ESPERs function similarly to regional property estimation algorithms. ESPERs also provide the benefit of estimating uncertainty corresponding to each predicted value, allowing for the propagation of those uncertainties through downstream computations. The ESPER-Mixed routine (an average of both the ESPER-LIR and ESPER-NN approaches) was used for this study, due to assessment statistics that have indicated a lower global RMSE for the ESPER-Mixed approach (e.g., a global average RMSE of 3.7 μmol kg⁻¹ for A_T) compared to ESPER-LIR (4.0 μmol kg⁻¹) and ESPER-NN (4.1 μmol kg⁻¹) when producing property estimates from SSS and SST³⁴.

Carbonate system calculations

CO₂ system calculations were performed using CO2SYSv3 for MATLAB⁵⁶ to determine additional ocean acidification (OA) indicators: dissolved inorganic carbon (C_T(RFR-LME)), pH on the total scale (pH_T(RFR-LME)), total hydrogen ion amount content ([H⁺]_T(RFR-LME)), total carbonate ion amount content ([CO₃²⁻]_T(RFR-LME)), saturation states for aragonite (Ω_ar(RFR-LME)) and calcite (Ω_ca(RFR-LME)), and Revelle factor (RF_(RFR-LME)). These calculations were performed using well established thermodynamic equations describing the chemistry of carbon dioxide in seawater^57,58. Input parameters to these equations were fCO_2(RFR-LME), along with ESPER-estimated A_T (A_T(ESPER)), phosphate (PO_4(ESPER)), and silicate (Si(OH)_4(ESPER)). Carbonic acid dissociation constants from Lueker et al.⁵⁹, the boric acid dissociation constant from Dickson⁶⁰, the total boron to salinity ratio from Lee et al.⁶¹, the dissociation constant of water from Dickson⁶², and the hydrofluoric acid dissociation constant from Perez and Fraga⁶³ were used in CO₂ system calculations. Uncertainties were propagated through these calculations (see following section).

Uncertainty estimation

Uncertainties in RFR-LME maps of fCO₂ were evaluated based on the previously described k-fold cross-validation approach. First, spatially gridded absolute values of ΔfCO₂ from k-fold cross-validation were low-pass filtered (using 0.5° × 0.5° windows) two times in each LME to begin to fill nearby grid cells with uncertainty values. Then, nearest-neighbor interpolation was used to fill any remaining empty grid cells with data-based, spatially scaled uncertainty values (${{\rm E}}_{{fCO}2(s)}$). This approach only assesses the strength of the fit for available. It is therefore prudent to assign greater uncertainties for periods and regions where training data are less abundant or absent. For this reason, the ${{\rm E}}_{{fCO}2(s)}$ values were further scaled over time by calculating two scaling factors specific to each LME, one representing the seasonal data coverage (using 3-month running means of the relative data coverage across the seasonal cycle) and another representing the relative annual data coverage (using 5-year running means of the relative data coverage across the timeseries).

The seasonal scaling factor (${{\rm{\varepsilon }}}_{{seas}.}$) was calculated as:

$${{\rm{\varepsilon }}}_{seas.}=\frac{({\sum }_{my=m{y}_{ref}-1}^{m{y}_{ref}+1}{n}_{obs(my)}/{n}_{tot(my)})/MY}{{n}_{obs(my)}/{n}_{tot(my)}}$$

where my is the numbered month of the year (1–12), my_ref is the reference month of the year for each time step (1–12), n_obs(my) is the number of grid cells with observations in the corresponding month of the year, n_tot(my) is the total number of available grid cells in the corresponding month of the year, and MY is the total number of months considered within the window for each time step. Because January (1) comes after December (12), my_ref − 1 = 12 when my_ref = 1 and my_ref + 1 = 1 when my_ref = 12. The long-term scaling factor (${{\rm{\varepsilon }}}_{{ann}.}$) was calculated as:

$${{\rm{\varepsilon }}}_{ann.}=\frac{({\sum }_{ms=m{s}_{ref}-24}^{m{s}_{ref}+24}{n}_{obs(ms)}/{n}_{tot(ms)})/MS}{{n}_{obs(ms)}/{n}_{tot(ms)}}$$

where ms is the numbered month in the full series (1–228), ms_ref is the reference month in the series for each time step (1–228), n_obs(ms) is the number of grid cells with observations in the corresponding month of the series, n_tot(ms) is the total number of available grid cells in the corresponding month, and MS is the total number of months considered within the window for each time step. Fewer months were considered within each window near the beginning and end of the time series. Finally, the estimated uncertainty of fCO_2(RFR-LME) scaled spatially and temporally (i.e., seasonally and annually) was calculated as:

$${{\rm E}}_{{fCO}2(s,t)}={{\rm E}}_{{fCO}2(s)}\times {{\rm{\varepsilon }}}_{{seas}.}\times {{\rm{\varepsilon }}}_{{ann}.}.$$

The window sizes of the scalers were selected to balance data coverage in each time window with realistic periods of time over which observational data may exhibit serial correlations.

Uncertainties in ESPER-estimated A_T and nutrients were provided by the ESPER algorithms, which estimate uncertainty using a polynomial fit to salinity and depth. The ESPER algorithms are less skillful in the surface ocean where we use them than they are globally across all depths, and the uncertainty estimates are correspondingly greater at shallow depths.

The uncertainty estimates were propagated along with standard estimated total uncertainties in carbonate system constants (see Table 1 in Orr et al.⁶⁴) to calculate uncertainty in mapped OA indicators. Gaussian uncertainty propagation was employed, using CO2SYSv3 for MATLAB⁵⁶, which is based on uncertainty propagation code introduced in CO2SYSv2 by Orr et al.⁶⁴.

Validation and evaluation

The skill of the RFR-LME maps was evaluated through comparisons with co-located OA indicators independently calculated from the ship-based GLODAPv2.2022 and CODAP-NA measurements described above. OA indicators were computed at in situ temperature from the A_T and C_T observations using CO2SYSv3 for MATLAB⁵⁶ and the same equilibrium constants as before. Although the GLODAPv2.2022 and CODAP-NA databases also include pH_T and pCO₂ measurements, they are not as widespread as A_T and C_T measurements, so we chose to calculate all indicators from A_T and C_T for evaluation. Each observation was then co-located with the corresponding RFR-LME grid cell and compared.

In addition, RFR-LME maps were compared to global mapped data products of sea surface pCO₂ obtained from SeaFlux (v2021.04)⁴⁷. Long-term averages of pCO₂ from RFR-LME maps and SeaFlux maps were computed across the overlapping time periods of both products (i.e., 1998–2019). Mapped differences between RFR-LME and each SeaFlux ensemble member, as well as an average across the ensemble, were computed and compared.

Finally, observations of pCO₂ at fixed buoy locations were compared to pCO₂ from RFR-LME data products at grid cells corresponding to those moored buoy observations. For this exercise, special-case RFR-LME maps were created by training RFRs on gridded fCO_2(SOCAT) data with buoy observations excluded, then using those algorithms to construct the maps. Comparing pCO₂ mapped from datasets both with and without the underlying buoy observations allowed for evaluation of the influence that those seasonally resolved observations have on the fidelity of the pCO₂ reconstruction. pCO₂ values extracted from the mapped SeaFlux datasets were also included in this comparison, allowing for separate evaluation of how the LME-scale, 0.25° × 0.25° monthly reconstructions compare to global 1° × 1° monthly reconstructions.

Data Records

RFR-LME maps can be accessed through the NOAA National Centers for Environmental Information (NCEI) via the Ocean Carbon and Acidification Data System (OCADS; https://doi.org/10.25921/h8vw-e872)³⁵. The dataset is available in NetCDF format on 0.25° × 0.25° spatial grids at monthly timesteps. Each mapped OA indicator and its uncertainty is provided via a separate NetCDF file, along with a reference grid that indicates to which LME each spatial grid cell belongs. Additionally, regional timeseries for CO₂ partial pressure, calcium carbonate saturation state, and pH are displayed at the NOAA Marine Ecosystem Status website (https://ecowatch.noaa.gov). Average values, trends, seasonal amplitudes, and uncertainty estimates of ocean acidification indicators from RFR-LMEs vary considerably among the regions (Tables 3–6).

Table 3 Long-term mean values for OA indicators in each LME.

Full size table

Table 4 Long-term trends and uncertainties for OA indicators in each LME.

Full size table

Table 5 Seasonal amplitudes for OA indicators in each LME.

Full size table

Table 6 Average uncertainties for OA indicators in each LME.

Full size table

Long-term means (Table 3; Fig. 4) allow for the description of LME-scale patterns in surface ocean carbonate chemistry. Tropical LMEs (PI and CS) are characterized most notably by high carbonate ion parameters ([CO₃²⁻]_T, Ω_ar, and Ω_ca) and low RF values. Within this pair, the CS can be described as more acidified (higher pCO₂ and lower pH_T) but better buffered (lower RF and higher A_T/C_T ratio). Subtropical Atlantic LMEs (GM and SE) also have high carbonate ion parameters ([CO₃²⁻]_T, Ω_ar, and Ω_ca) and low RF values. Compared to the Tropical LMEs however, Subtropical Atlantic LMEs have higher C_T and A_T values, although A_T/C_T ratios and therefore RFs are similar between the two groups. Temperate and subarctic coastal LMEs (CCS, GA, and NE) can generally be considered intermediate in all parameters: pCO₂, pH, carbonate ion parameters, C_T, A_T, and RF. Within the group, the GA has the highest RF and lowest carbonate ion parameters, the NE has the lowest RF and highest carbonate ion parameters, and the CCS is between the two. Subarctic North Pacific LMEs (AI and EBS) are characterized by high C_T, pCO₂, and RF; and low pH_T and carbonate ion parameters. Arctic LMEs (NBCS and BS) are characterized by high pH_T and RF; and low A_T, pCO₂, and carbonate ion parameters.

Spatial variability in OA indicators is evident within each LME and throughout the seasonal cycle (Fig. 5). For example, the CCS develops a strong dipole in the summer (June/July/August; Fig. 5c), with low C_T off the coast in the northern C_T and high C_T off the coast in the central CCS. This dipole becomes much weaker in the winter (December/January/February; Fig. 5a). Similarly, relatively low C_T occurs off the coast in the northern NE region in the summer but disappears in the winter (Fig. 5c). The southern continental Alaskan coastline exhibits low C_T, especially nearshore in the summer, whereas the northern Alaskan coastline is relatively higher in C_T than nearby offshore waters in the Arctic Ocean. A band of relatively low C_T is evident from about 10° to 20° N in the PI region, between higher C_T in the equatorial Pacific and North Pacific subtropical gyre, a feature that has appeared in other sea surface C_T data products⁶.

Mapped indicator uncertainties (see Fig. 6) are served alongside RFR-LME maps³⁵, providing a resource for evaluating uncertainty in OA indicator values at a given location. Area-weighted mean u[pCO_2(RFR-LME)] was 12.0 μatm across the entire domain, u[pH_T(RFR-LME)] was 0.015, and u[Ω_ar(RFR-LME)] was 0.18. These domain-wide means are influenced by the large area and low uncertainties in the Pacific Islands region; individual LME uncertainties, particularly in Artic and Subarctic LMEs, may be considerably larger. Spatial patterns of uncertainties also differ for different OA indicators. For example, u[Ω_ar(RFR-LME)] tends to be relatively high in the tropical LMEs (Fig. 6d), where Ω_ar(RFR-LME) is also high (Fig. 4e); on the other hand, u[pCO_2(RFR-LME)] is extremely low in tropical the LMEs (Fig. 6b).

Uncertainty values reflect not only uncertainty in the RFR predictions, but also uncertainty introduced by interpolating over spatial and temporal gaps in observational coverage. Average uncertainty values for each LME are presented alongside OA indicator timeseries on the NOAA NaMES website. Importantly, the uncertainty values provided in Table 6 and on the NaMES website represent weighted means of grid-cell-level uncertainties rather than uncertainties corresponding to region-wide averages, which may or may not be smaller due to cancelling errors that are removed by areal averaging or larger due to inadequacies of our spatiotemporal scaling approach for representing uncertainties in under-sampled times and locations.

Technical Validation

Data-based validation

A k-fold cross-validation approach was used to assess the skill of the fCO₂ estimates and subsequent OA indicator calculations. Region-wide error statistics for each of the eleven LMEs (before the spatial and temporal scaling) indicate that fCO_{2(RFR-LME-kFold)} values are centered around (mean and median errors all close to zero) and tend to correlate closely with (nine of the eleven R² values are 0.8 or greater) the measured values of fCO_2(SOCAT) (Table 7). Root mean square errors (RMSEs) are generally about three times larger than median absolute errors, indicating error populations with long tails of a few particularly large errors. When viewed spatially (Fig. 6a), absolute differences (|ΔfCO₂|) between fCO_{2(RFR-LME-kFold)} and fCO_2(SOCAT) are greatest near the coast and in the North Pacific and Arctic, and smallest in the open ocean and in the tropics and subtropics. High |ΔfCO₂| values tend to correlate with areas of high background variability in fCO_2(SOCAT) (Fig. 2b), emphasizing that the RFR algorithms may struggle to capture extreme values, which is consistent with the aforementioned long-tailed error populations.

Table 7 Error statistics of fCO₂ predicted by k-fold cross-validation algorithms (fCO_{2(RFR-LME-kFold)}) compared to fCO₂ from SOCAT observations (fCO_2(SOCAT)).

Full size table

Comparison to global trends

RFR-LME indicator timeseries (1998–2022) represent spatially weighted annual averages of OA indicators computed from RFR-LME maps. Increasing pCO_2(RFR-LME) and decreasing pH_T(RFR-LME) are observed in each LME (Figs. 7, 8) — trends that are strongly influenced by anthropogenic CO₂ uptake and amplified by ocean warming (Table 8). Ω_ar(RFR-LME) decreases in many (but not all) LMEs over 1998–2022 (Fig. 9), as Ω_ar decline is driven by anthropogenic CO₂ uptake as well, but moderated by ocean warming and also influenced by changes in SSS (Table 8). Trends in OA indicators across U.S. LMEs (Table 4) can be compared with global trends of about + 1.5 μatm yr^–1 for pCO₂ (+0.3 to +1.8 μatm yr^–1 for RFR-LMEs), +0.9 μmol kg^–1 yr^–1 for C_T (–0.2 to +1.0 μmol kg^–1 yr^–1 for RFR-LMEs), –1.7·10^–3 units yr^–1 for pH_T (–1.8·10^–3 to –0.2·10^–3 yr^–1 for RFR-LMEs), and –7.0·10^–3 yr^–1 for Ω_ar^1,4,6,10 (–7.3·10^–3 to +2.6·10^–3 yr^–1 for RFR-LMEs).

Table 8 Long-term mean values, trends, and seasonal amplitudes for temperature and salinity in each LME.

Full size table

It is important to note that, for some of the Arctic and subarctic LMEs that are characterized by high seasonal ice coverage, these trends are driven by primarily summertime OA indicator values (see inset plots in Figs. 7–9). This limitation, along with the fact that these timeseries are relatively short (25 years) and regionally limited, can explain divergence in some specific cases from the global trends.

Comparison to discrete shipboard data

The RFR-LME fields presented in this work are constructed using surface CO₂ measurements from shipboard flow-through analyzers. This automated observational approach allows for the collection of high spatial and temporal resolution observations of surface ocean carbonate chemistry. Discrete bottle measurements of carbonate chemistry parameters represent another approach for monitoring ocean acidification. The discrete approach allows for high-quality observations throughout the water column. Here we take near-surface discrete bottle measurements of A_T and C_T from GLODAPv2.2022⁵¹ and CODAP-NA⁵², use those measurements to calculate OA indicators, and compare those calculated values with mapped surface OA indicators from RFR-LMEs.

RFR-LME indicator values are generally in good agreement with calculations from discrete bottle measurements (Fig. 10). Compared to the k-fold-validation-based uncertainty estimates (Table 6), a greater spread (i.e. larger IQRs) in the differences between GLODAP/CODAP and RFR-LME values is expected in this exercise for two reasons. First, uncertainty stemming from CO₂ system calculations will contribute to the spread (e.g., Orr et al.⁶⁴), since GLODAP/CODAP indicators values are calculated from A_T and C_T and RFR-LME indicator values are calculated from fCO₂ and A_T. As an example, average propagated uncertainties for GLODAP/CODAP calculations using standard measurement errors for A_T and C_T (2 μmol kg⁻¹ for both) and for equilibrium constants⁶⁴ were calculated as 12.0 μatm for pCO₂, 0.014 for pH_T, and 0.11 for Ω_ar. In addition, the two datasets are fundamentally different in their spatiotemporal resolution. RFR-LME grid cells represent averages for large swaths of the surface ocean over a monthly timestep, whereas shipboard measurements are appropriate for a distinct point in space at a distinct time. This spatiotemporal mismatch is especially noteworthy in the coastal ocean where diurnal and other sub-monthly modes of variability operate over spatial scales much finer that 0.25 degrees of latitude or longitude. The calculations from bottle measurements also tend to indicate higher pCO₂ and therefore lower pH_T and Ω_ar. These offsets between the two datasets may be partly related to inconsistencies in carbonate chemistry calculations, whereby calculations from A_T and C_T at most surface conditions tend to produce lower pH_T (and higher pCO₂) values than corresponding measurements of those properties^65,66.

Comparison to moored buoy time series data

Timeseries of pCO₂ from fixed grid cells of RFR-LME maps and RFR-LME maps constructed without moored buoy observations (RFR-LME-NM) were compared to pCO₂ observations at fixed buoy locations that were extracted from the SOCATv2023 database and aggregated in monthly bins. This provides a test of the capacity of RFR-LMEs to reproduce monthly variability in validation measurements that were withheld from training, and can be considered an assessment of the RFR-LME skill with monthly variability generally. Mapped global data products of surface ocean carbonate chemistry obtained from SeaFlux^27,47 were also compared to the moored buoy observations.

Differences between moored buoy observations and mapped products (Table 9; Fig. 11) suggest that timeseries extracted from our regionally focused RFR-LME maps more meaningfully reflect observed pCO₂ than those from mapped global products. Like RFR-LME, most of these alternative products were trained from versions of SOCAT that include the buoy observations. The average median ( ± IQR) ΔpCO₂ (pCO_2(moor.) – pCO_2(grid)) was 0.1 ± 23.6 μatm for RFR-LME and increased to −13.0 ± 49.2 μatm for the RFR-LME-NM product, which excluded these observations from the training data. These increased error statistics emphasize the value of moored buoy observations for the surface CO₂ observing system. Still, all but one (JENA-MLS; ΔpCO₂ = −2.2 ± 48.7) of the mapped data products from SeaFlux exhibited more variability in their differences from buoy observations than even the version of RFR-LME constructed without moored buoy observations. JENA-MLS may perform better at representing pCO₂ at these mooring sites because it explicitly models mixed layer fluxes and processes rather than relying on empirical relationships learned from large sets of data.

Table 9 Medians and interquartile ranges (μatm) of comparisons between moored buoy observations and corresponding grid cells from mapped monthly sea surface pCO₂ data products.

Full size table

Individual timeseries from moored buoy sites (Fig. 12) emphasize the significant seasonal and interannual variability in buoy pCO₂ observations (black dots), even when aggregated in monthly bins, and the challenge for mapped products (colored lines) to accurately capture each of those variations at a local scale. The performance of the regional RFR-LME maps compared to the global mapped products reinforces the notion that locally specific relationships captured by training machine learning algorithms at the scale of objectively defined clusters within LMEs can resolve fine-scale variations in ocean biogeochemistry more effectively than global-scale algorithms, even though those global-scale algorithms are trained with a larger amount of data^24,67. Positive trends in pCO₂ superimposed upon seasonal variations are visible in both the moored buoy observations and mapped data product timeseries (Fig. 12).

Comparison to mapped data products

Finally, RFR-LME surface pCO₂ was compared directly to the six global-scale mapped products of pCO₂ from SeaFlux across the overlapping interval between them (1998–2019). Maps of average surface pCO₂ display similar patterns across all six SeaFlux products, but differences between those products and RFR-LME (ΔpCO₂ = pCO_2(SeaFlux) − pCO_2(RFR-LME)) reveal subtle regional differences (Fig. 13). SeaFlux provides a pCO₂ filler field derived from Landschützer et al.⁶⁸ to fill spatial gaps in global surface products; this gap filler is not used to produce the difference maps displayed in Fig. 13. However, for spatial consistency, it is used to calculate the averages and standard deviations of the differences for each data product shown in Fig. 13.

In the tropical Pacific, RFR-LME maps agreed well with all products but NIES-FNN, where a prevailing negative bias is evident in that product. In the Atlantic, RFR-LME maps generally agreed well, with visible biases in the Mississippi plume (CSIR-ML6), Georges Bank (JMA-MLR), Caribbean (JENA-MLS), and throughout (NIES-FNN). Coastal negative biases are visible for most products in the central CCS region, and coastal positive biases are visible in the northern CCS region. Both positive and negative biases occur in the regions surrounding Alaska, where low observational density likely leads to significant diversity in pCO₂ estimates among the gap-filling approaches.

Despite these regional discrepancies with some individual products, the median (±1 IQR) ΔpCO₂ for the ensemble average of all six SeaFlux products is 0.8 ± 16.6 μatm. This indicates that RFR-LME — which represents local-scale temporal variability in surface pCO₂ more effectively than global products (Table 9; Fig. 11) — agrees at broad scales with observation-based products that are well accepted and widely used by community-wide synthesis efforts such at the Global Carbon Budget² and REgional Carbon Cycle Assessment and Processes Project (RECCAP2)⁶⁹.

Code availability

Code for accessing and processing the data discussed in this study is freely available on Github (https://github.com/jonathansharp/US-RFR-LMEs). Code was written in MATLAB version R2022a. Parameters used to generate and validate the current dataset are described throughout the Methods section and are listed in Table 2.

References

IPCC et al. Changing Ocean, Marine Ecosystems, and Dependent Communities. IPCC Special Report on the Ocean and Cryosphere in a Changing Climate 447–588 (2019).
Friedlingstein, P. et al. Global Carbon Budget 2023. Earth System Science Data 15, 5301–5369 (2023).
Article Google Scholar
Feely, R. A. et al. Acidification of the Global Surface Ocean: What We Have Learned from Observations. Oceanography 36, 120–129 (2023).
Google Scholar
Ma, D., Gregor, L. & Gruber, N. Four Decades of Trends and Drivers of Global Surface Ocean Acidification. Global Biogeochemical Cycles 37, e2023GB007765 (2023).
Article ADS CAS Google Scholar
Jiang, L.-Q. et al. Global Surface Ocean Acidification Indicators From 1750 to 2100. Journal of Advances in Modeling Earth Systems 15, e2022MS003563 (2023).
Article ADS Google Scholar
Gregor, L. & Gruber, N. OceanSODA-ETHZ: a global gridded data set of the surface ocean carbonate system for seasonal to decadal studies of ocean acidification. Earth System Science Data 13, 777–808 (2021).
Article ADS Google Scholar
Sutton, A. J. et al. Autonomous seawater pCO₂ and pH time series from 40 surface buoys and the emergence of anthropogenic trends. Earth System Science Data 11, 421–439 (2019).
Article ADS MathSciNet Google Scholar
Bates, N. et al. A time-series view of changing ocean chemistry due to ocean uptake of anthropogenic CO₂ and ocean acidification. Oceanography 27, 126–141 (2014).
Article Google Scholar
Denvil-Sommer, A., Gehlen, M., Vrac, M. & Mejia, C. LSCE-FFNN-v1: A two-step neural network model for the reconstruction of surface ocean pCO₂ over the global ocean. Geoscientific Model Development 12, 2091–2105 (2019).
Article ADS CAS Google Scholar
Iida, Y., Takatani, Y., Kojima, A. & Ishii, M. Global trends of ocean CO₂ sink and ocean acidification: an observation-based reconstruction of surface ocean inorganic carbon variables. J Oceanogr 77, 323–358 (2021).
Article CAS Google Scholar
Laruelle, G. G. et al. Global high-resolution monthly pCO₂ climatology for the coastal ocean derived from neural network interpolation. Biogeosciences 14, 4545–4561 (2017).
Article ADS CAS Google Scholar
Roobaert, A., Regnier, P., Landschützer, P. & Laruelle, G. G. A novel sea surface pCO₂-product for the global coastal ocean resolving trends over the 1982–2020 period. Earth System Science Data Discussions 1–32, https://doi.org/10.5194/essd-2023-228 (2023).
Byrne, R. H., Mecking, S., Feely, R. A. & Liu, X. Direct observations of basin-wide acidification of the North Pacific Ocean. Geophysical Research Letters 37, 1–5 (2010).
Article Google Scholar
Carter, B. R. et al. Pacific Anthropogenic Carbon Between 1991 and 2017. Global Biogeochemical Cycles 33, 597–617 (2019).
Article ADS CAS Google Scholar
Caldeira, K. & Wickett, M. E. Anthropogenic carbon and ocean pH. Nature 425, 365 (2003).
Article ADS CAS PubMed Google Scholar
Bopp, L. et al. Multiple stressors of ocean ecosystems in the 21st century: Projections with CMIP5 models. Biogeosciences 10, 6225–6245 (2013).
Article ADS Google Scholar
Kwiatkowski, L. et al. Twenty-first century ocean warming, acidification, deoxygenation, and upper-ocean nutrient and primary production decline from CMIP6 model projections. Biogeosciences 17, 3439–3470 (2020).
Article ADS CAS Google Scholar
Feely, R. A. et al. Chemical and biological impacts of ocean acidification along the west coast of North America. Estuarine, Coastal and Shelf Science 183, 260–270 (2016).
Article ADS CAS Google Scholar
National Marine Fisheries Service. Large marine ecosystems of the world: an annotated bibliography. https://doi.org/10.7289/V5/TM-NMFS-F/SPO-167 (2016).
Dai, M. et al. Carbon Fluxes in the Coastal Ocean: Synthesis, Boundary Processes, and Future Trends. Annu. Rev. Earth Planet. Sci. 50, 593–626 (2022).
Article ADS CAS Google Scholar
Mackenzie, F., Andersson, A., Lerman, A. & Ver, L. Boundary exchanges in the global coastal margin: implications for the organic and inorganic carbon cycles. The sea 13, 193–225 (2005).
Google Scholar
Hauri, C. et al. Spatiotemporal variability and long-term trends of ocean acidification in the California Current System. Biogeosciences 10, 193–216 (2013).
Article ADS Google Scholar
Laruelle, G. G., Lauerwald, R., Pfeil, B. & Regnier, P. Regionalized global budget of the CO₂ exchange at the air-water interface in continental shelf seas. Global Biogeochemical Cycles 28, 1199–1214 (2014).
Article ADS CAS Google Scholar
Sharp, J. D., Fassbender, A. J., Carter, B. R., Lavin, P. D. & Sutton, A. J. A monthly surface pCO₂ product for the California Current Large Marine Ecosystem. Earth System Science Data 14, 2081–2108 (2022).
Article ADS Google Scholar
Chau, T.-T.-T., Gehlen, M., Metzl, N. & Chevallier, F. CMEMS-LSCE: A global 0.25-degree, monthly reconstruction of the surface ocean carbonate system. Earth System Science Data Discussions 1–52, https://doi.org/10.5194/essd-2023-146 (2023).
Bakker, D. C. E. et al. A multi-decade record of high-quality fCO₂ data in version 3 of the Surface Ocean CO₂ Atlas (SOCAT). Earth System Science Data 8, 383–413 (2016).
Article ADS Google Scholar
Fay, A. R. et al. SeaFlux: harmonization of air–sea CO₂ fluxes from surface pCO₂ data products using a standardized approach. Earth System Science Data 13, 4693–4710 (2021).
Article ADS Google Scholar
Friedlingstein, P. et al. Global Carbon Budget 2022. Earth System Science Data 14, 4811–4900 (2022).
Article ADS Google Scholar
Takahashi, T. et al. Climatological mean and decadal change in surface ocean pCO₂, and net sea–air CO₂ flux over the global oceans. Deep-Sea Research Part II 56, 24 (2009).
Article Google Scholar
Rödenbeck, C. et al. Interannual sea-air CO₂ flux variability from an observation-driven ocean mixed-layer scheme. Biogeosciences 11, 4599–4613 (2014).
Article ADS Google Scholar
Iida, Y. et al. Trends in pCO₂ and sea–air CO₂ flux over the global open oceans for the last two decades. Journal of Oceanography 71, 637–661 (2015).
Article CAS Google Scholar
Landschützer, P. et al. A neural network-based estimate of the seasonal to inter-annual variability of the Atlantic Ocean carbon sink. Biogeosciences 10, 7793–7815 (2013).
Article ADS Google Scholar
Landschützer, P., Gruber, N., Bakker, D. C. E. & Schuster, U. Recent variability of the global ocean carbon sink. Global Biogeochemical Cycles 28, 927–949 (2014).
Article ADS Google Scholar
Carter, B. R. et al. New and updated global empirical seawater property estimation routines. Limnology and Oceanography: Methods https://doi.org/10.1002/lom3.10461 (2021).
Sharp, J. D. et al. RFR-LME Ocean Acidification Indicators from 1998 to 2022 (NCEI Accession 0287551). https://doi.org/10.25921/H8VW-E872 (2024).
Gregor, L., Lebehot, A. D., Kok, S. & Scheel Monteiro, P. M. A comparative assessment of the uncertainties of global surface ocean CO₂ estimates using a machine-learning ensemble (CSIR-ML6 version 2019a)-Have we hit the wall? Geoscientific Model Development 12, 5113–5136 (2019).
Article ADS Google Scholar
National Centers for Environmental Information. Surface Ocean CO₂ Atlas Database Version 2023 (SOCATv2023) (NCEI Accession 0278913).
Huang, B. et al. Improvements of the Daily Optimum Interpolation Sea Surface Temperature (DOISST) Version 2.1. Journal of Climate 34, 2923–2939 (2021).
Article ADS Google Scholar
European Union - Copernicus Marine Service. Global Ocean Ensemble Physics Reanalysis. Mercator Ocean International https://doi.org/10.48670/MOI-00024 (2019).
European Union - Copernicus Marine Service. Global Ocean Gridded L4 Sea Surface Heights And Derived Variables Reprocessed (1993-Ongoing). Mercator Ocean International https://doi.org/10.48670/MOI-00148 (2021).
NASA Ocean Biology Processing Group. Aqua MODIS Level 3 Mapped Chlorophyll Data, Version R2022.0. NASA Ocean Biology Distributed Active Archive Center https://doi.org/10.5067/AQUA/MODIS/L3M/CHL/2022 (2022).
NASA Ocean Biology Processing Group. OrbView-2 SeaWiFS Global Mapped Chlorophyll (CHL) Data, version R2022.0. [object Object] https://doi.org/10.5067/ORBVIEW-2/SEAWIFS/L3M/CHL/2022 (2022).
Copernicus Climate Change Service. ERA5 monthly averaged data on single levels from 1979 to present. ECMWF https://doi.org/10.24381/CDS.F17050D7 (2019).
NOAA National Centers for Environmental Information. ETOPO 2022 15 Arc-Second Global Relief Model. NOAA National Centers for Environmental Information https://doi.org/10.25921/FD45-GT74 (2022).
Kanamitsu, M. et al. NCEP–DOE AMIP-II Reanalysis (R-2). Bull. Amer. Meteor. Soc. 83, 1631–1644 (2002).
Article ADS Google Scholar
NOAA ESRL GML CCGG Group. Earth System Research Laboratory Carbon Cycle and Greenhouse Gases Group Flask-Air Sample Measurements of CO₂ at Global and Regional Background Sites, 1967-Present. NOAA ESRL GML CCGG Group https://doi.org/10.15138/WKGJ-F215 (2019).
Gregor, L. et al. SeaFlux v2023: harmonised sea-air CO₂ fluxes from surface pCO₂ data products using a standardised approach. Zenodo https://doi.org/10.5281/ZENODO.8280457 (2023).
Chau, T., Gehlen, M. & Chevallier, F. Global Ocean Surface Carbon Product MULTIOBS_GLO_BIO_CARBON_SURFACE_REP_015_008. Update 2, 09 (2020).
Google Scholar
Jersild, A., Landschützer, P., Gruber, N. & Bakker, D. C. E. An observation-based global monthly gridded sea surface pCO₂ and air-sea CO₂ flux product from 1982 onward and its monthly climatology (NCEI Accession 0160558). NOAA National Centers for Environmental Information https://doi.org/10.7289/V5Z899N6 (2017).
Zeng, J., Nojiri, Y., Landschützer, P., Telszewski, M. & Nakaoka, S. A global surface ocean fCO₂ climatology based on a feed-forward neural network. Journal of Atmospheric and Oceanic Technology 31, 1838–1849 (2014).
Article ADS Google Scholar
Lauvset, S. K. et al. GLODAPv2.2022: the latest version of the global interior ocean biogeochemical data product. Earth Syst. Sci. Data 14, 5543–5572 (2022).
Article ADS Google Scholar
Jiang, L. et al. Coastal Ocean Data Analysis Product in North America (CODAP-NA) – an internally consistent data product for discrete inorganic carbon, oxygen, and nutrients on the North American ocean margins. 1–23 (2021).
McLachlan, G. J. & Peel, D. Finite Mixture Models. (Wiley, New York, 2000).
Breiman, L. Random forests. Machine learning 45, 5–32 (2001).
Article Google Scholar
Weiss, R. F. Carbon dioxide in water and seawater: the solubility of a non-ideal gas. Marine Chemistry 2, 203–215 (1974).
Article CAS Google Scholar
Sharp, J. D. et al. CO2SYSv3 for MATLAB. Zenodo https://doi.org/10.5281/ZENODO.3950562 (2023).
Dickson, A. G., Sabine, C. L. & Christian, J. R. Guide to Best Practices for Ocean CO₂ Measurements. PICES Special Publication 3 (North Pacific Marine Science Organization, Sidney, B.C., Canada, 2007).
Lewis, E. & Wallace, D. W. R. CO2SYS-Program developed for the CO₂ system calculations. Carbon Dioxide Information Analysis Center Report ORNL/CDIAC-105 (1998).
Lueker, T. J., Dickson, A. G. & Keeling, C. D. Ocean pCO₂ calculated from dissolved inorganic carbon, alkalinity, and equations for K₁ and K₂: validation based on laboratory measurements of CO₂ in gas and seawater at equilibrium. Marine Chemistry 70, 105–119 (2000).
Article CAS Google Scholar
Dickson, A. G. Thermodynamics of the dissociation of boric acid in synthetic seawater from 273.15 to 318.15 K. Deep Sea Research Part A, Oceanographic Research Papers 37, 755–766 (1990).
Article ADS CAS Google Scholar
Lee, K. et al. The universal ratio of boron to chlorinity for the North Pacific and North Atlantic oceans. Geochimica et Cosmochimica Acta 74, 1801–1811 (2010).
Article ADS CAS Google Scholar
Dickson, A. G. Standard potential of the reaction: AgCl_(s) + 12H_2(g) = Ag_(s) + HCl_(aq), and and the standard acidity constant of the ion HSO₄⁻ in synthetic sea water from 273.15 to 318.15 K. The Journal of Chemical Thermodynamics 22, 113–127 (1990).
Article ADS CAS Google Scholar
Perez, F. F. & Fraga, F. Association constant of fluoride and hydrogen ions in seawater. Marine Chemistry 21, 161–168 (1987).
Article CAS Google Scholar
Orr, J. C., Epitalon, J., Dickson, A. G. & Gattuso, J. Routine uncertainty propagation for the marine carbon dioxide system. Marine Chemistry 207, 84–107 (2018).
Article CAS Google Scholar
Fong, M. B. & Dickson, A. G. Insights from GO-SHIP hydrography data into the thermodynamic consistency of CO₂ system measurements in seawater. Marine Chemistry 211, 52–63 (2019).
Article CAS Google Scholar
García-Ibáñez, M. I. et al. Gaining insights into the seawater carbonate system using discrete fCO2 measurements. Marine Chemistry 245, 104150 (2022).
Article Google Scholar
Duke, P. J. et al. Estimating marine carbon uptake in the northeast Pacific using a neural network approach. Biogeosciences 20, 3919–3941 (2023).
Article ADS CAS Google Scholar
Landschützer, P., Laruelle, G. G., Roobaert, A. & Regnier, P. A combined global ocean pCO₂ climatology combining open ocean and coastal areas (NCEI Accession 0209633). NOAA National Centers for Environmental Information https://doi.org/10.25921/QB25-F418 (2020).
Ciais, P. et al. Definitions and methods to estimate regional land carbon fluxes for the second phase of the REgional Carbon Cycle Assessment and Processes Project (RECCAP-2). Geoscientific Model Development 15, 1289–1316 (2022).
Article ADS CAS Google Scholar

Download references

Acknowledgements

This research was supported by the NOAA Ocean Acidification Program (OAP; https://ror.org/02bfn4816) under the project “Temporal changes of ocean acidification indicators in the U.S. large marine ecosystems (LMEs) - an operational data product at NOAA/NCEI,” project ID 21925, as part of the initiative to bolster NOAA’s National Marine Ecosystem Status effort. Additional funding for JDS and BRC was from the Cooperative Institute for Climate, Ocean, & Ecosystem Studies (CIOCES) under NOAA cooperative agreement no. NA20OAR4320271. Additional funding for L-QJ, PDL, and HY was from NOAA National Centers for Environmental Information (NCEI) through a NOAA Cooperative Institute for Satellite Earth System Studies (CISESS) Grant (NA19NES4320002) at the Earth System Science Interdisciplinary Center (ESSIC), University of Maryland. BRC also thanks the Carbon Data Management and Synthesis Grant from NOAA’s Global Ocean Monitoring and Observation division (GOMO), Fund Ref. No. 100007298 for supporting his contributions to this project. This is CICOES contribution no. 2024-1342 and PMEL contribution no. 5593. The Surface Ocean CO₂ Atlas (SOCAT) is an international effort, endorsed by the International Ocean Carbon Coordination Project (IOCCP), the Surface Ocean Lower Atmosphere Study (SOLAS) and the Integrated Marine Biosphere Research (IMBeR) program, to deliver a uniformly quality-controlled surface ocean CO₂ database. The many researchers and funding agencies responsible for the collection of data and quality control are thanked for their contributions to SOCAT. NOAA OISST V2 High Resolution Dataset data and NCEP/DOE Reanalysis II data were provided by the NOAA PSL, Boulder, Colorado, USA, from their website at https://psl.noaa.gov. This study has been conducted using E.U. Copernicus Marine Service Information; https://doi.org/10.48670/moi-00024, https://doi.org/10.48670/moi-00048, https://doi.org/10.24381/cds.f17050d7. Neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus information or data it contains. NASA sea surface chlorophyll data were provided by the NASA Ocean Biology Processing Group (OBPG) and the NASA Ocean Biology Distributed Active Archive Center (OB.DAAC). Atmospheric CO₂ data were prepared and provided by the NOAA GML Carbon Cycle Group. We thank the NOAA Ecosystem Indicators Working Group members — including Willem Klajbor (NOAA/AOML), Chris Kelbe (NOAA/AOML), Erica Towle (NOAA/Coral Reef Conservation Program), and Xiao Liu (NOAA/GFDL) — as well as Kaitlin Goldsmith (NOAA/OAP) and other stakeholders for providing comments that helped improved the web interface. We thank Tim Boyer for leading the proposal securing funding for this work and for providing comments and feedback on the manuscript. Zachary Strasberg (University of New Mexico), a summer intern with NOAA OAP, performed computational simulations that informed our error estimates and the results of which may be incorporated into future releases of this data product.

Author information

Authors and Affiliations

Cooperative Institute for Climate, Ocean, and Ecosystem Studies, University of Washington, Seattle, WA, 98195, USA
Jonathan D. Sharp & Brendan R. Carter
NOAA/OAR Pacific Marine Environmental Laboratory, Seattle, WA, 98115, USA
Jonathan D. Sharp & Brendan R. Carter
Cooperative Institute for Satellite Earth System Studies, Earth System Science Interdisciplinary Center, University of Maryland, College Park, MD, 20740, USA
Li-Qing Jiang, Paige D. Lavin & Hyelim Yoo
NOAA/NESDIS National Centers for Environmental Information, Silver Spring, MD, 20910, USA
Li-Qing Jiang & Hyelim Yoo
NOAA/NESDIS Center for Satellite Applications and Research, College Park, MD, 20740, USA
Paige D. Lavin
NOAA/NESDIS National Centers for Environmental Information, Charleston, SC, 29412, USA
Scott L. Cross

Authors

Jonathan D. Sharp
View author publications
You can also search for this author in PubMed Google Scholar
Li-Qing Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Brendan R. Carter
View author publications
You can also search for this author in PubMed Google Scholar
Paige D. Lavin
View author publications
You can also search for this author in PubMed Google Scholar
Hyelim Yoo
View author publications
You can also search for this author in PubMed Google Scholar
Scott L. Cross
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JDS contributed to the proposal securing funding for this work and led the coding, figure generation, and writing efforts. L-QJ drafted the initial version of the proposal securing funding for this work, managed the overall project and the budget at CISESS (UMD), and provided comments and feedback on the manuscript. BRC contributed to the proposal securing funding for this work, managed the project at UW CICOES, and provided comments and feedback on the manuscript. PDL contributed to the proposal securing funding for this work, assisted with the machine learning model selection and tuning, and provided comments and feedback on the manuscript. HY provided feedback on the code, provided comments and feedback on the manuscript, and has led the transfer of code to a cloud environment in preparation of automated creation of the OA indicators. SLC contributed to the proposal securing funding for this work, interfaced with the NOAA Ecosystem Indicators Working Group, and provided comments and feedback on the manuscript.

Corresponding author

Correspondence to Jonathan D. Sharp.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sharp, J.D., Jiang, LQ., Carter, B.R. et al. A mapped dataset of surface ocean acidification indicators in large marine ecosystems of the United States. Sci Data 11, 715 (2024). https://doi.org/10.1038/s41597-024-03530-7

Download citation

Received: 29 February 2024
Accepted: 14 June 2024
Published: 02 July 2024
DOI: https://doi.org/10.1038/s41597-024-03530-7
Springer Nature Limited

A mapped dataset of surface ocean acidification indicators in large marine ecosystems of the United States

Abstract

Background & Summary

Methods

Data sources

Spatial clustering

Machine learning regressions

Alkalinity and nutrient estimation

Carbonate system calculations

Uncertainty estimation

Validation and evaluation

Data Records

Technical Validation

Data-based validation

Comparison to global trends

Comparison to discrete shipboard data

Comparison to moored buoy time series data

Comparison to mapped data products

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation