Main

Coastal margins play a key role in the global carbon (C) cycle, constituting <10% of the entire ocean area, but contributing >90% to the overall organic carbon (OC) burial in global oceans1,2. Coastal margins serve as the interface between terrestrial and oceanic C pools and receive a diverse mixture of organic compounds from both allochthonous terrestrial inputs and autochthonous marine primary production3,4. However, OC with different origins exhibits distinct signatures, reactivities and ages4,5, therefore responding differently to remobilization and alteration processes in coastal oceans5,6. For instance, rivers transport ~200 megatonnes of particulate OC per year to oceans7, of which 55–80% is remineralized along continental margins3. Furthermore, anthropogenic activities have substantially altered regional and global C cycling (for example, production, transport, preservation and burial of OC) along river–ocean continuums8,9. Therefore, obtaining accurate information about the nature and burial of OC in global coastal sediments is challenging but crucial for developing robust C budgets and predicting future C dynamics.

Radiocarbon (Δ14C) of OC has emerged as a powerful tool for investigating contemporary aquatic C biogeochemistry10,11,12, offering insights into the average age of the local OC mixed with various organic compounds of different ages4. More-negative Δ14C values (that is, 14C-depleted) indicate older 14C ages (that is, a predominance of old OC, possibly mixed with some modern OC) and vice versa. The Δ14C value not only serves as a tracer of OC sources with diverse ages (for example, 14C-enriched young photosynthetic C, 14C-depleted aged petrogenic C and pre-aged C somewhere in between)4,12, but also provides a window on the dynamics of C transfer between surface C reservoirs on Earth13. Recent studies have elucidated basin-scale variations of Δ14C in some riverine and marine C pools (for example, refs. 14,15) and attempts have been made to compile available Δ14C information to obtain their global patterns13,16,17,18,19,20,21.

However, although the number of Δ14C measurements has increased greatly, available data cover only a fraction of global river basins and coastal zones and are unevenly distributed spatially, with frequent measurements at some locations and none in others (Extended Data Fig. 1). So far, a high-resolution dataset of Δ14C with global coverage is lacking for both river particles and coastal sediments. Moreover, available observations in river basins and coastal sediments are usually disconnected, complicating a systematic analysis of Δ14C along river–ocean continuums. Scaling up unevenly distributed site-level Δ14C observations to the global scale at a high spatial resolution for both global rivers and coastal zones is imperative to better constrain the global C cycle in a spatially explicit manner.

Machine learning has emerged lately as a powerful tool for scaling up site-level observations to high-resolution global patterns, particularly in the fields of Earth system science22, terrestrial C and nitrogen biogeochemistry23,24 and marine sediment geochemistry25,26. These machine learning applications offer a new approach to the challenging high-resolution mapping of Δ14C values in global river–ocean continuums.

In this Article, we outline the compilation of Δ14C data for 2,559 observations (737 locations) of riverine particles and 1,325 observations (1,325 locations) of coastal surface sediments (depths <5 cm) worldwide (Extended Data Fig. 1) from published literature and databases, including Circum-Arctic Sediment Carbon Database (CASCADE), Modern River Archives of Particulate Organic Carbon (MOREPOC) and Modern Ocean Sediment Archive and Inventory of Carbon (MOSAIC)19,20,21 (Methods). We also compiled extensive data for environmental variables known to regulate C delivery and accumulation (Supplementary Table 1). These data were used to train and test machine learning models (Fig. 1) and to generate a high-resolution global map of Δ14C values in river–ocean continuums. We also performed machine learning simulations for total OC (TOC) contents and δ13C values of global coastal sediments. Using a combination of the high-resolution global patterns of TOC contents and Δ14C and δ13C values, we identified accumulation hotspots for modern and aged OC from marine and terrestrial sources in coastal oceans worldwide.

Fig. 1: Environmental variables driving Δ14C values of riverine particles and coastal sediments.
figure 1

af, Importance of environmental variables in driving Δ14C values of riverine particles (ac) and coastal sediments (df) based on correlation analysis (a and d), the random forest approach (b and e) and plots of overall prediction performance using the feature-selected SVM for Δ14C values (c and f). *P < 0.05; **P < 0.01; ***P < 0.001. The percentage of increase in the mean square error (MSE) represents the importance of environmental variables in driving Δ14C using the random forest approach. Belowground NPP_20, belowground net primary productivity at 20 cm depth; Belowground NPP_200, belowground net primary productivity at 200 cm depth; R2, coefficient of determination; MAE, mean absolute error.

Global mapping of OC Δ14C values in river–ocean continuums

Using the machine learning approach and compiled data, we generated a high-resolution Δ14C atlas for river particles and coastal sediments with good agreement with observations (Methods and Supplementary Discussion). This atlas provides high-spatial resolution Δ14C values with complete global coverage (Fig. 2). We further converted Δ14C values into 14C ages to obtain high-resolution global patterns of 14C ages (Extended Data Fig. 2 and Methods).

Fig. 2: Global distributions and latitudinal patterns of predicted Δ14C values in riverine particles and coastal sediments.
figure 2

ad, Global distributions (a and c) and latitudinal patterns (b and d) of predicted Δ14C values in riverine particles (a and b) and coastal sediments (c and d). The map in a is shown at river orders 1–7, as defined by the classical ordering system of HydroBASINS (https://www.hydrosheds.org/products/hydrobasins) and includes 22,442 predicted values (Methods). The map in c is shown at a spatial resolution of 10′ × 10′ and includes 99,807 predicted values (Methods). The coloured lines in b and d represent mean values and the grey shading represents s.d.

Spatially, the predicted Δ14C values of global riverine particles exhibit variability and vary sharply with latitude (Fig. 2a,b). Positive or less-negative Δ14C values are predominantly located in South America, Africa and Southern and Southeastern Asia (Fig. 2a). The most-negative Δ14C values are mainly found in the Arctic permafrost region, highlands such as the Qinghai–Tibet and Mongolian Plateaus and mountainous regions such as Taiwan Island and the Western United States (Fig. 2a). Along the latitudinal gradient, Δ14C values are more negative in high-latitude regions (beyond 60° N or 60° S) and less negative (or more positive) in low-latitude regions (between 30° N and 30° S) (Fig. 2b).

This spatial variability also exists in the predicted Δ14C values of global coastal sediments (Fig. 2c,d). Positive or less-negative Δ14C values are primarily in the Subarctic shelf, Sunda shelf, Caribbean Sea and parts of the African coast (Fig. 2c). The most-negative Δ14C values are mainly in wide shelf areas such as the Arctic shelf, East China Sea and northern shelf of Australia, as well as near mouths of rivers such as the Amazon, Huanghe, Indus, Mississippi and Irrawaddy estuaries (Fig. 2c). The latitudinal trend of Δ14C of global coastal sediments is similar to that of global riverine particles, with more-negative values in high-latitude regions and less-negative values in low-latitude regions (Fig. 2d).

The predictive uncertainty of Δ14C values in global riverine particles ranges from 0.01–22.74%, with an average of 3.15 ± 2.27% (Fig. 3a). Similarly, the predictive uncertainty of Δ14C values in coastal sediments ranges from 0.13–15.34%, with an average of 2.04 ± 2.70% (Fig. 3c). One important reason for the predictive uncertainties is the low spatial density of sampling, together with the high spatial variability of observed Δ14C values. High predictive uncertainties occur in regions with limited or no observations (for example, high-altitude or high-latitude regions, such as the Arctic continent and Subarctic coastal ocean; Fig. 3a,c and Extended Data Fig. 1a,b). In addition, Δ14C values may vary with river discharge13,17, while small, mountainous rivers in the mid-latitudes (for example, Taiwan Island and Southeast Asia) are under-sampled at high-discharge times (Fig. 3a and Extended Data Fig. 1). Along the latitudinal gradient, the predictive Δ14C uncertainties for riverine particles show variability, with higher values in high-latitude regions and much lower values in low-latitude regions (Fig. 3b). For coastal sediments, the latitudinal variation in uncertainties is characterized by higher values in the Northern Hemisphere and lower values in the Southern Hemisphere (Fig. 3d).

Fig. 3: Global distributions and latitudinal patterns of predicted uncertainty of Δ14C values in riverine particles and coastal sediments.
figure 3

ad, Global distributions (a and c) and latitudinal patterns (b and d) of predicted uncertainty in Δ14C values in riverine particles (a and b) and coastal sediments (c and d). The map in a is shown at river orders 1–7, as defined by the classical ordering system of HydroBASINS (https://www.hydrosheds.org/products/hydrobasins) and includes 22,442 predicted values (Methods). The map in c is shown at a spatial resolution of 10′ × 10′ and includes 99,807 predicted values (Methods). The coefficient of variation (CV) is used to represent the uncertainty of machine learning models. The coloured lines in b and d represent mean values and the grey shading represents s.d.

Overall, the utilization of machine learning techniques, together with available observations, has enabled high-resolution prediction of Δ14C values in global river–ocean continuums, with uncertainty, accuracy and spatial coverage that compare favourably with existing databases19,20,21. The global patterns of Δ14C values in riverine particles and coastal sediments produced in this study can be applied to investigation of the C cycle regionally and globally.

Critical modes of river–coastal OC 14C ages worldwide

Four distinct modes emerged in this study (Fig. 4): the old–young mode describes old 14C ages in riverine particles accompanied by young 14C ages in corresponding coastal sediments; the young–old mode describes young 14C ages in riverine particles coupled with old 14C ages in corresponding coastal sediments; the young–young mode describes young 14C ages in both riverine particles and corresponding coastal sediments; and the old–old mode describes old 14C ages in both riverine particles and corresponding coastal sediments.

Fig. 4: Global distribution of 14C ages in riverine particles and coastal sediments worldwide and for a range of typical sub-regions.
figure 4

ae, Global distribution of 14C ages in both riverine particles and coastal sediments worldwide (a) and magnified for a range of typical sub-regions: the Arctic (b), Subarctic (c), Amazon river–ocean continuum (d) and Sunda river–ocean continuum (e). Panels bd are magnified from the sections outlined by dashed boxes in a. The global map in a shows river orders 1–7, as defined by the the classical ordering system of HydroBASINS (https://www.hydrosheds.org/products/hydrobasins) and includes 22,442 predicted riverine particle values (Methods). For coastal sediments, 99,807 predicted values are included in a at a spatial resolution of 10′ × 10′ (Methods). The predicted 14C ages were converted from the predicted Δ14C values in Fig. 2 following the approach described in the Methods.

The old–young mode is common in the Subarctic river–ocean continuum (Fig. 4c), western coast of the United States and coast of South Africa (Fig. 4a and Table 1). The annual export of riverine particles of old 14C ages (mainly petrogenic C) to these coastal regions is limited3, with pulsed transport and burial of riverine OC occurring over days to weeks, associated with storms but possibly not captured in the sampling dates3. The young (that is, recently produced) OC in these coastal sediments may mainly originate from marine primary productivity27 (Supplementary Fig. 1). The hotspots of young 14C ages on the western coast of the United States (for example, offshore California), coast of South Africa (for example, coast of Cape Peninsula) and Subarctic regions (for example, Shelikhov Gulf and Sakhalinskiy Bay) match the locations of reported upwelling zones28,29,30, where upwelling of nutrient-rich water stimulates in situ primary productivity27 (Supplementary Figs. 1 and 2). However, in the Subarctic river–ocean continuum the predicted 14C ages are highly uncertain (Fig. 3c) due to the scarcity of local observations.

Table 1 Five modes of OC 14C ages in river–ocean continuums worldwide, their typical regions and their role in the contemporary coastal C cycle

The young–old mode is primarily in low-latitude regions, such as the Amazon, Congo and Fly river–ocean continuums (Fig. 4d). These systems have extensive floodplains with large standing stocks of biomass, which can be transported by rivers to coastal zones during flooding periods31. This pattern is also in some subtropical to temperate river–ocean continuums, such as the Changjiang, Indus, Irrawaddy and Mississippi and their estuaries (Fig. 4a and Table 1). Rivers in these regions typically receive OC from terrestrial ecosystems and have large in situ algal production; high temperature and high precipitation enhance terrestrial and freshwater primary production (Supplementary Fig. 3). Eglinton et al.18 proposed that the ages of riverine biospheric C are positively correlated with the turnover time and 14C ages of soil OC. Negative relationships have been observed globally between the logarithmic OC turnover time and temperature or precipitation32,33. Thus, soil OC in low-latitude regions (with high temperature and high precipitation) tends to have shorter turnover times and younger 14C ages31,34. In their corresponding coastal zones, marine primary productivity may also be high due to large riverine nutrient inputs (Supplementary Fig. 1). However, 14C ages in these coastal sediments are old and do not reflect the young 14C ages in river particles or the high marine primary production. This is because the young terrestrial, freshwater and marine OC is preferentially degraded in the highly dynamic shallow environments with extensive sediment reworking, as is reflected by their thick sediment mixed layers (SMLs)26; for example, a maximum SML thickness of ~200 cm in the Amazon Estuary and large SML thicknesses of ~30 cm in other aforementioned coastal regions (Supplementary Fig. 4). The thick SMLs are primarily attributed to physical perturbation and bioturbation, which cause continuous resuspension–redeposition loops.

The young–young mode primarily occurs in low-latitude regions, typically in the Sunda river–ocean continuum (Fig. 4e), Caribbean coastal regions, western coast of Mexico and South China Sea (Fig. 4a and Table 1). As for the young–old mode, rivers in these regions receive large amounts of OC from terrestrial primary production and have high freshwater production, leading to young 14C ages for river particles. These rivers also transport substantial nutrients from land to sea, enhancing marine primary production (Supplementary Figs. 1 and 5). The 14C ages in their corresponding coastal sediments are also young, which differs from those in the young–old mode. This may be due to the lower hydrodynamics in these coastal zones, resulting in rather stable sedimentary environments and thin SMLs (Supplementary Fig. 4). Therefore, younger OC originating from both autochthonous (for example, phytoplankton detritus) and allochthonous sources (for example, plant debris and phytoplankton detritus) can be effectively deposited and preserved. Moreover, such stable hydrodynamics triggers frequent hypoxic events through oxygen consumption during OC decay and limitations in water–sediment exchange35. The hypoxic conditions may further limit bioturbation and microbial respiration and contribute to the preservation of young OC35,36, as can be seen in the western coast of Mexico, Caribbean coastal regions and Northwest European Shelf (Fig. 4a and Supplementary Fig. 6).

The old–old mode can be divided into two sub-modes: the old–old A mode, which is common in the Arctic river–ocean continuum (Fig. 4b) with a wide shelf; and the old–old B mode in mountainous river–ocean continuums in Taiwan Island and eastern Australia with a narrow shelf (Fig. 4a and Table 1). In the old–old A mode, despite high uncertainties in the predicted 14C ages in Arctic rivers due to limited data coverage (Extended Data Fig. 1), the 14C ages of river particles exported by Arctic rivers are overall older than those in mid- to low latitudes. Such old riverine 14C ages are primarily attributed to the 14C-depleted soils, such as permafrost and Yedoma in Arctic river basins34. For example, a recent study37 showed that the majority of terrestrial OC in the circum-Arctic region originates from near-surface soils (61%) and permafrost (30%). Although another recent study suggests that the warming Arctic may enhance the export of riverine aquatic biomass production38, it is important to recognize that most of this biomass produced in aquatic environments may be degraded during cross-shelf transport39. The marine primary production in Arctic coastal regions tends to be low because of low temperature, low nutrient inputs and low water transparency (Supplementary Figs. 3, 5 and 7). In the old–old B mode, the old 14C ages in coastal sediments are attributed to the substantial input of aged petrogenic OC transported by mountainous river draining areas of high erodibility40,41.

Implications for coastal OC accumulation and present CO2

Previous studies have explored the global distribution of sedimentary OC in coastal margins25,42,43, primarily focusing on OC content. Information regarding OC source, composition and burial potential remains very limited. Traditionally, OC accumulation is discussed in terms of the balance between OC input/production on the one hand and OC degradation/export on the other. OC accumulating in coastal systems is then considered a sink of carbon dioxide (CO2). However, such a mass balance approach largely ignores the source and ageing of OC, and this might bias the inferred implications for contemporary atmospheric CO2 levels14.

To further elucidate coastal C dynamics, the TOC contents and δ13C values of global coastal sediments were predicted using similar machine learning techniques (Methods), with satisfying performance of model training and testing (Supplementary Table 2). Using a combination of our high-resolution data of global coverage for TOC contents, δ13C values and 14C ages, we identified the hotspots of OC accumulation potential in coastal oceans worldwide by accounting for not only OC quantity, but also OC source, composition and ambient environmental conditions. The hotspots of high coastal OC content reported by Bianchi et al.44 were also found in our study (Extended Data Fig. 3a). Moreover, we identified three different types of hotspots of coastal OC accumulation in surface sediments (hereafter referred to as OC accumulation).

Coastal regions with both high TOC content and the old–young riverine–coastal 14C age mode are the primary hotspots for young OC accumulation (Table 1 and Extended Data Fig. 3a). These regions exhibit less-negative δ13C values (Extended Data Fig. 3c), indicating that their OC primarily originates from autochthonous marine primary production. This OC origin is also supported by the spatial overlap of OC accumulation hotspots with upwelling zones (supporting high primary production; Supplementary Figs. 1 and 2) and oxygen minimum zones (supporting efficient young OC preservation; Supplementary Fig. 6)36. Accumulation of young OC in these hotspots has a positive effect on the removal of contemporary atmospheric CO2. This result is also consistent with the sea–air CO2 flux synthesized by Roobaert et al.45, in which the southeastern coast of Australia and coast of South Africa are a sink of atmospheric CO2 (Supplementary Fig. 8).

Coastal oceans with high TOC content and the young–old riverine–coastal 14C age mode are primary hotspots for old OC (pre-aged OC and aged petrogenic OC) accumulation (Table 1 and Extended Data Fig. 3a). The δ13C values of OC in these coastal sediments are predominantly more negative (Extended Data Fig. 3c), indicating that the OC buried here primarily originates from terrestrial inputs. This is supported by the results of n-alkanes in the Changjiang Estuary46, as well as the more frequently detected petroleum (aged petrogenic OC) compared with biogenic hydrocarbons by mooring observations in northeastern Australia47. The highly dynamic conditions due to physical disturbances (reflected by thick SMLs) induce degradation of labile OC (both young riverine material and locally produced marine OC) in these coastal regions3,5, as is reflected by their coastal CO2 effluxes to the atmosphere (particularly the Amazon Estuary and northeastern Australia; Supplementary Fig. 8). Consequently, the coastal accumulation of old OC does not contribute to contemporary atmospheric CO2, but the degradation of young OC causes CO2 emissions, collectively forming a source of contemporary atmospheric CO2.

Organic-rich sediments in the old–old A mode are another hotspot of old OC accumulation. In the Arctic region, the OC in coastal sediments is mainly from the erosion of permafrost37, along with minor inputs from aquatic biomass that would probably degrade during cross-shelf transport39. This is re-affirmed by the more-negative δ13C values in these coastal sediments (Extended Data Fig. 3c) and biomarker evidence on the predominant terrestrial source and in situ marine OC degradation48. Moreover, the local marine primary production is low (Supplementary Fig. 1) because of low temperature, limited nutrient inputs and low water transparency (Supplementary Figs. 3, 5 and 7). However, not all terrestrial-sourced old OC along these river–coastal continuums is preserved. Molecular degradation proxies indicate that ~1.7 Gg yr−1 of old OC in the Arctic shelf is degraded during cross-shelf transport49, making systems of this type negatively impact the removal of contemporary atmospheric CO2.

Furthermore, we also identify regions characterized by relatively low TOC content and the young–young riverine–coastal 14C age mode, whose roles in the contemporary C cycle are variable (Table 1 and Extended Data Fig. 3a). The relatively less-negative δ13C values of OC in most of these regions indicate a dominant source from marine primary production, except for the nearshore South China Sea with relatively more-negative δ13C values (Extended Data Fig. 3c). This is supported by the combination of terrestrial- and marine-sourced OC in the South China Sea shown by multi-proxy molecular biomarker analyses50. The OC in riverine particles in these tropical regions is mainly composed of plant and phytoplankton debris18,51, thus showing young 14C ages (Fig. 4). This riverine input of young OC, together with young OC from marine primary production, contributes to the young coastal OC in the relatively stable sedimentary environment. These young–young systems can represent a sink or source for atmospheric CO2; for instance, the sea-air CO2 flux density atlas shows a CO2 sink of the northern shelf of the South China Sea (preservation of terrestrial-/riverine-produced OC) and a CO2 source of the Sunda shelf (consumption of terrestrial OC; Supplementary Fig. 8). Irrespective of whether the C is imported or locally produced, the role of these regions as the source or sink of atmospheric CO2 depends on the OC balance of the local system, which is vulnerable to global warming because of the labile nature of the young OC. In warmer waters, the temperature-dependent metabolic rates of heterotrophic bacteria increase, thereby accelerating remineralization of the young OC. This process has been used to explain the elevated organic matter recycling efficiency and decreased OC burial in warm climates52.

In contrast, coastal margins characterized by low TOC content and the old–old B riverine–coastal 14C age mode (Table 1) accumulate only a small amount of C and play a minor role in contemporary CO2. For instance, OC in the mountainous river–ocean continuums in Taiwan Island and eastern Australia mainly originate from weathering and erosion of bedrocks40,53. In the coastal zone of eastern Australia, the in situ marine OC production is very low (Supplementary Fig. 1). In the Taiwan Strait, the relatively enriched 13C and old 14C ages, together with low TOC contents (Extended Data Fig. 3a,b), indicate a dominant contribution of aged petrogenic OC from Taiwan Island40, rather than marine OC, which rapidly degrades in the highly dynamic environment during transport (Supplementary Figs. 1 and 2). Degradation of old (and sometimes young) OC lowers the TOC contents in these regions, with a slightly negative impact on removing contemporary atmospheric CO2.

Our study provides new insights into the spatial patterns of global coastal OC accumulation potential by combining machine learning approaches with comprehensive observational data for Δ14C, δ13C and TOC and their environmental drivers. The high-resolution global maps of Δ14C values and OC fate in river–ocean continuums from this study, if incorporated, can substantially improve the robustness of C cycling prediction and climate change projections in Earth system models and have far-reaching implications for developing effective zero-CO2 strategies and national C budgets, including blue C stocks. The characteristic patterns of 14C ages of riverine particles and corresponding coastal sediments demonstrate the relative importance of terrestrial inputs and marine primary production on coastal OC and how ocean margin C budgets relate to factors such as temperature, precipitation and sedimentary dynamics. Our results also point out critical regions with poor data availability (for example, high-altitude or high-latitude regions), necessitating further investigation efforts to understand their local OC dynamics and improve their prediction accuracy. Notably, this study only focuses on particulate OC, while dissolved OC (DOC) also accounts for a large fraction of C in river–-ocean continuums2. Including DOC dynamics in future modelling efforts (for example, machine learning or process-based modelling) will further enhance our understanding of global C cycling, in addition to our findings on C dynamics of riverine particles and coastal sediments.

Methods

Data source and processing

Most of the OC Δ14C values of riverine particles worldwide were collected from the MOREPOC database19. The compiled Δ14C values from the database were counterchecked with the reported values in the original references to ensure data accuracy. We also searched keywords, including ‘Δ14C values/radiocarbon/OC 14C content’ and ‘Δ14C values/radiocarbon/OC 14C content of riverine particles’ in the Web of Science, ResearchGate and Google Scholar and added to our database the Δ14C data from the most recent publications (that is, those not in MOREPOC). Finally, data of 2,559 observations from 737 sampling sites distributed globally were compiled (Extended Data Fig. 1a).

The Δ14C values of coastal sediments were mainly collected from the CASCADE and MOSAIC 1.0 published databases20,21. Similarly, we added the latest literature-reported data by searching using the Web of Science, ResearchGate and Google Scholar for keywords such as ‘Δ14C values/radiocarbon/OC 14C content’ and ‘Δ14C values/radiocarbon/OC 14C content of sediments’. In the end, we compiled a global dataset of 1,325 coastal sites with Δ14C values (Extended Data Fig. 1b). In addition, 4,496 sites of TOC contents and 1,554 sites of δ13C values in global coastal sediments were collected from MOSAIC 1.0 for further analysis.

Data of all environmental variables were collected from published databases (Supplementary Table 1), with 28 variables used for riverine OC Δ14C value projections and 15 for marine OC Δ14C value projections. Pearson correlation analysis showed a weak to moderate correlation between pairs of variables within the full set of variables (Extended Data Figs. 4 and 5). Climatic variables that control primary productivity, soil microbial communities and surface weathering and erosion are potential drivers for Δ14C values of riverine particles18,32,54, including the aridity index, air humidity, mean annual precipitation, mean annual temperature, surface soil temperature and subsurface soil temperature. Geomorphology is considered an important factor that controls weathering and erosion and influences river export of particulate OC with various Δ14C values13,53. To characterize the effect of geomorphology, variables of elevation, slope, modelled sediment yield, discharge, soil loss, R factor of rainfall erosivity, K factor of soil erodibility, C factor of vegetation cover, slope length and steepness factor were included. Soil properties may influence the degradation and preservation of OC by affecting microbial activities and may change the Δ14C values of soil OC. The variables used to describe soil properties include clay, silt, sand, soil OC content, pH, cation exchange capacity and soil biodiversity. Anthropogenic activities have been demonstrated to drastically perturb the terrestrial C pool and further influence the export of soil OC18,55. To represent various anthropogenic perturbations, variables of population density, human development index and gross domestic product were used. Primary productivity indexes such as net primary productivity and belowground net primary productivity (at 20 and 200 cm depths) are also considered important factors of the Δ14C values of riverine particles and were used in this study.

The environmental variables used to predict the Δ14C values of coastal sediments were grouped into physical properties, chemical properties, climate properties, primary productivity indexes and sedimentary properties (Supplementary Table 1). The physical properties included flow velocity, tidal range, mixed-layer thickness, suspended sediment concentration and water depth. Among them, flow velocity, tidal range, mixed-layer thickness and water depth are hydrodynamic parameters that potentially influence OC degradation or preservation in coastal zones by influencing the transport and exchange of OC and oxygen, as well as their oxygen exposure time39,56. For instance, a high flow velocity, large tide range and thick mixed layer may prolong the oxygen exposure time and hence the interaction between oxygen and OC, which accelerates the degradation of OC. Chemical properties such as salinity, \(p_{{\rm{CO}}_{2}}\) and dissolved inorganic C may influence photosynthesis and microbial activities57,58,59 and further impact the Δ14C values of coastal sediments. Climate properties such as sea surface temperature and sea subsurface temperature may also influence photosynthesis and microbial activities. Relatively high temperature can stimulate primary productivity and microbial activities, thus affecting the production and consumption of OC. Primary productivity indexes such as phytoplankton concentration and net primary productivity were also involved in model building. Sedimentary properties, such as sediment mixed-layer thickness, sediment thickness and TOC content, were the main controlling factors of OC degradation and burial in sediments26,60.

Detailed information for the data of all of these environment variables, including spatial resolution, time period and sources, is provided in Supplementary Table 1. To ensure the spatial correspondence of each environmental variable dataset, we resampled all of the datasets to match a 10′ × 10′ resolution.

Feature selection and machine learning models

A global dataset of 2,559 river particle observations and 1,325 coastal surface sediment observations (Extended Data Fig. 1a,b) was compiled to train and test machine learning models. A total of 28 and 15 environmental variables were selected, respectively, to build up reliable predictive machine learning models for river particles and coastal sediments. For machine learning models, more variables may not necessarily improve model performance, but may sometimes lead to poor model performance, unnecessary (and undesirable) high model complexity and uncertainty propagating therein61. To obtain the optimal assembly of explanatory variables, we used Pearson correlation analysis and the random forest method to filter out the most important variables. Specifically, correlations were examined between Δ14C values and each environmental variable using SPSS, and important variables were selected based on correlation coefficients and significances. The randomForest R package was used to train the models with different predictor combinations by examining the tenfold cross-validation and electing the optimal combination of independent variables with the best agreement between predicted and observed Δ14C values. The optimal assembly of explanatory variables was determined by combining these two feature selection methods. Eleven variables are included in the model for predicting Δ14C values of riverine particles, including mean annual temperature, mean annual precipitation, elevation, slope, modelled sediment yield, R factor of rainfall erosivity, contents of clay and silt, soil OC content, cation exchange capacity, net primary productivity and gross domestic product. To predict Δ14C values in coastal sediments, different sets of environmental variables were used for Arctic regions and non-Arctic regions. The optimal variables in the Arctic region include dissolved inorganic C, salinity, \(p_{{\rm{CO}}_{2}}\), flow velocity, tidal range, net primary productivity, phytoplankton concentration, sea subsurface temperature, seawater transparency, mixed-layer thickness and water depth. The optimal variables in the non-Arctic region include TOC content, flow velocity, phytoplankton concentration, salinity, tidal range, sea subsurface temperature, sea surface temperature, water depth, dissolved inorganic C, \(p_{{\rm{CO}}_{2}}\) and sediment thickness.

To build up reliable models for predicting the Δ14C values of riverine particles and coastal sediments, we compared the different approaches, including multivariable linear regression, k-nearest neighbour, decision tree, neural network, boosting, random forest and support vector machine (SVM). The multivariable linear regression method is mainly used to describe the linear relationship between explanatory variables with dependent variables. However, when the number of dependent variables is too large, the model can overfit or underfit. The k-nearest neighbour method is a nonparametric approach that assigns weights to distances based on sample proximity, to reduce the impact of outliers and to improve model robustness25. The decision tree is a hierarchical classifier that recursively partitions a dataset into increasingly homogenous subsets (referred to as nodes) to predict class membership62. However, deep decision trees with sparse leaf nodes may lead to overfitting, thus reducing the model’s generalization ability. The neural network is usually a set of neurons that connects the input layer, hidden layers and output layer, and the key parameters are the number of hidden layers and neurons26.

Compared with the single model described above, ensemble models (including bagging, boosting and stacking algorithms) improve the model’s generalization ability by amalgamating multiple models, thereby mitigating the risk of overfitting associated with a single model. The boosting method, such as Gradient Boosting and eXtreme Gradient Boosting, is currently the dominant tree-based ensemble learning algorithm due to its powerful and robust predictability63,64. The boosting algorithm enhances weak learners through iterative training on a dataset, adjusting their sample weights based on error rates until a set number of weak learners is attained. These are then amalgamated into a robust learner. Models such as Gradient Boosting and eXtreme Gradient Boosting, renowned for their efficacy in ensemble settings, mitigate overfitting by fine-tuning tree parameters, step size and penalty coefficients. In contrast, bagging diverges by randomly sampling the dataset for each weak learner’s creation, fostering multiple independent learners via random sampling. Random forest advances bagging by employing decision trees as weak learners and incorporating random feature selection, thereby optimizing generalization and curbing overfitting through precise tree parameter adjustments. The SVM method operates under the assumption that the joint distribution of the input and output variables is unknown, yet a correlation between them exists65. The SVM method projects the input features into a higher-dimensional feature space using kernel functions, thereby transforming linearly inseparable features to separable ones and iteratively adjusting the hyperplane to find the optimum solution66,67. SVM can enhance prediction accuracy by utilizing the outputs of multiple models as inputs to train a new model through a stacking algorithm.

All compiled datasets were randomly divided into two parts, with 70% of the full dataset used for training and the remaining 30% used for testing. Traditional evaluation methods may lack accuracy when assessing a model’s performance with limited test data, often due to the potential lack of representativeness. Cross-validation mitigates this by partitioning the data into multiple folds and iteratively training and validating the model across different folds. This approach yields a more reliable and consistent evaluation of performance, which further reduces the risk of overfitting or underfitting. A tenfold cross-validation method was used to evaluate the model performance. During this tenfold cross-validation process, data for model training were divided into ten subsets of the same size; every nine subsets of data were used for model training, with the one left used for model validation. To reduce the uncertainty associated with stochastic sampling and identify the most predictive models, we trained the machine learning models 100 times and optimized for the highest R2. In each run, we employed three iterations of tenfold cross-validation for model training. In addition to ensuring an adequate sample size in the training dataset, employing ensemble models and performing cross-validation methods, suitable feature selection can also help to mitigate overfitting because it can reduce the model complexity and reduce the impact of irrelevant or redundant features. For Δ14C in riverine particles and coastal sediments, we respectively constructed several models based on the compiled datasets, using the randomForest R package with feature selection and the MATLAB toolbox (the Statistics and Machine Learning Toolbox and Deep Learning Toolbox) without feature selection. We evaluated modelling performance by comparing the coefficient of determination (R2) and mean absolute error value; higher R2 and lower mean absolute error values represent better model performance. Among the seven models used, the SVM performed the best. Optimizing hyperparameters in the SVM, such as the regularization parameter C and the type of kernel function, can improve model effectiveness. Regularization reduces model complexity, prevents overfitting and improves generalization to new data. The kernel function in SVM addresses nonlinearity in data and the radial basis kernel function is often preferred because of its better simplicity and performance compared with linear and polynomial kernels in most cases. To identify the optimal combination of hyperparameter settings for the SVM, we conducted hyperparameter tuning using the tenfold cross-validation method for the environmental variables based on grid search and optimization algorithms. Ultimately, the particle swarm optimization SVM (PSO-SVM), based on biological optimization algorithms, showed the best performance in predicting Δ14C values of riverine particles among a series of SVMs; the grid search method-based SVM (GSM-SVM) showed the best performance in predicting Δ14C values, δ13C values and TOC contents in coastal sediments. The GSM-SVM was therefore used to predict the Δ14C values in global river particles, and GSM-SVM was used for Δ14C values, δ13C values and TOC contents in global coastal sediments.

Conversion from Δ14C values to 14C ages

Radiocarbon data are variably reported as Δ14C values, fraction modern (Fm) values and/or radiocarbon ages (14C ages). We used the following formulas68 to convert Δ14C values to 14C ages:

$$\Delta {}^{14}{\rm{C}}=1,000\times \left({F}_{{\rm{m}}}\times {\exp }^{-\lambda \left(y-1,950\right)}-1\right)$$
(1)
$${F}_{{\rm{m}}}=\,\left(\left(\Delta {\scriptstyle{14}\atop}{\rm{C}}/1,000\right)+1\right)\times {\exp }^{\lambda \left(\;y-1,950\right)}$$
(2)
$${\scriptstyle{14}\atop}{\rm{C}}_{{\rm{age}}}=\,-8,033\times \mathrm{ln}\left[{F}_{{\rm{m}}}\right]$$
(3)

whereby λ = 1/8,267 yr−1 and y is the year of sample collection and measurement. Notably, we assume identical years of collection and measurement because such information is typically not reported and because their minor difference (less than 5 years) does not introduce a significant error in the context of this study17.

Global mapping of predictors

For riverine particles, the geographic location of prediction points is defined according to the classical ordering system in the attribute table of the HydroBASINS database69. In this classical ordering system, order 1 represents the main stem river from sink to source, order 2 represents all tributaries that flow into an order 1 river, order 3 represents all tributaries that flow into an order 2 river and so on.

First, we calculated the coordinate of the geometric centre of each sub-basin (corresponding to river order 7) in each river basin using the Calculate Geometry function in the ArcGIS attribute table. Then, we allocated the predicted riverine Δ14C values at the sub-basin centre to the nearest river order 7 using the Near function in ArcToolbox. Finally, 22,442 data points were allocated across global river basins.

For coastal sediments, global maps with a consistent spatial resolution of 10′ × 10′ were generated using ArcGIS Pro, representing 99,807 sites of the coastal margins with water depths of no more than 200 m. First, the attributes of selected explanatory variables at each site were extracted in ArcGIS Pro and exported to the predictor database. Then, we fed the predictor database to the trained GSM-SVM to predict the gridded Δ14C values, TOC contents and δ13C values in global coastal sediments.