Oceanic sediment accumulation rates predicted via machine learning algorithm: towards sediment characterization on a global scale

Observed vertical sediment accumulation rates (n = 1031) were gathered from ~ 55 years of peer reviewed literature. Original methods of rate calculation include long-term isotope geochronology (14C, 210Pb, and 137Cs), pollen analysis, horizon markers, and box coring. These observations are used to create a database of global, contemporary vertical sediment accumulation rates. Rates were converted to cm year−1, paired with the observation’s longitude and latitude, and placed into a machine learning–based Global Predictive Seabed Model (GPSM). GPSM finds correlations between the data and established global “predictors” (quantities known or estimable everywhere, e.g., distance from coastline and river mouths). The result, using a k-nearest neighbor (k-NN) algorithm, is a 5-arc-minute global map of predicted benthic vertical sediment accumulation rates. The map generated provides a global reference for vertical sedimentation from coastal to abyssal depths. Areas of highest sedimentation, ~ 3–8 cm year−1, are generally river mouth proximal coastal zones draining relatively large areas with high maximum elevations and with wide, shallow continental shelves (e.g., the Gulf of Mexico and the Amazon Delta), with rates falling exponentially towards the deepest parts of the oceans. The exception is Oceania, which displays significant vertical sedimentation over a large area without draining the large drainage basins seen in other regions. Coastal zones with relatively small drainage basins and steep shelves display vertical sedimentation of ~ 1 cm year−1, which is limited to the near shore when compared with shallow, wide margins (e.g., the western coasts of North and South America). Abyssal depth rates are functionally zero at the time scale examined (~ 10−4 cm year−1) and increase one order of magnitude near the Mid-Atlantic Ridge and at the Galapagos Triple Junction.


Introduction
The properties and distribution of seafloor sediment are controlled primarily by coastal processes that are dynamic, changing in response to both anthropogenic and natural stimuli. These properties, which influence a large swath of the benthic environment including faunal habitat, carbon sequestration, and seafloor stability, vary with water depth and proximity to sediment sources, primarily river mouths. The inherent dynamicity of coastal regions, especially those proximal to a river outlet, makes prediction of subaqueous sediment properties complex. While several coastlines are well studied with substantial efforts underway to describe hazards, sediment behavior, and characteristics (e.g., the Gulf Coast of the USA and the Norwegian Coast), regional studies are, by definition, geographically narrow in scope. Researchers must also implement a suite of physical tests to analyze sediment characteristics and behavior, including coring, radioisotope-based sedimentation rate calculations, grain size analysis, loss-on-ignition tests, and density analysis (e.g., Nittrouer and Sternberg 1981;Richardson et al. 2002;Keller et al. 2017;Restreppo et al. 2019). These tests require in situ sample collection, intensive subsampling, and prolonged laboratory analysis to assemble a usable data set. Spreading across tens or hundreds of cores, this work could require months to years of labor, as well as special permitting to acquire samples in sensitive areas. Therefore, a less regionally focused, more expedient method is needed for global scale sediment characterization.
By collecting coastal and oceanic sediment accumulation rates from 89 peer reviewed sources spanning the past~55 years and pairing the data with the US Naval Research Laboratory's Global Predictive Seabed Model (GPSM), a 5arc-minute global map of vertical sedimentation rates is generated, spanning from the coastal zone to the abyssal plain across all oceans. Of special interest are areas in which sedimentation patterns have been understudied or where complete gaps in data exist. Vertical accumulation, when predicted at the global scale, provides a window into where sediment of all types is aggrading onto the seafloor. A precise model of vertical sediment accumulation enables researchers to link ongoing research, whether pertaining to geology, biology, engineering, etc., to sediment input into a given region.

Predictive machine learning in the geosciences
Predictive machine learning has been used to estimate unknown quantities using a set of known quantities from "previously solved cases," for quite some time (Friedman 2006). In the geosciences, these "previously solved cases" include quantifiable real-world observations, such as rates of sedimentation, grain size and content, and isotopic ratios. The application of predictive machine learning to the geosciences is a somewhat recent trend, focusing on topics such as hazard prediction, mineral prospecting, seafloor sediment porosity, behavioral characterization of rock masses, and remote sensing (Goetz et al. 2015;Martin et al. 2015;Rodiguez-Galiano et al. 2015;Lary et al. 2016). Efforts are already underway to predict deltaic changes in response to anthropogenic impacts (damming, water withdrawal, etc.) that decrease river discharge (Nienhuis et al. 2018). Additionally, research has been done to predict the influence of waves on sediment dispersal along coastline adjacent to a river outlet (Nienhuis et al. 2015). These studies, however, are focused principally on channel morphology and near river outlet sediment transport; studies on predictive benthic sediment characterization are limited. Modeling of seabed parameters has been attempted by Huang et al. (2011) andLi et al. (2012). Both studies endeavor to quantify the composition of benthic sediment in Australia, providing valid insight into regional sedimentary behavior and related problems. However, there exists no synthesized map at the global scale.
Studies that pair real world observations with machine learning techniques (k-nearest neighbor and random forest) have been used to successfully predict total organic carbon on the seafloor, as well as seafloor porosity in areas with gaps in data, on a global scale (Martin et al. 2015;Lee et al. 2019). In both cases, resolution of 5 × 5 arc-minutes was achieved.

Methods
Vertical sediment accumulation rates from peer-reviewed literature from the year 1965 to the present were gathered for use in GPSM (n = 1031; Table 1). These rates span coastal environments from the near shore and river-mouth proximal subaqueous deltas to the Mid-Atlantic Ridge and abyssal plain of the Pacific (Fig. 1). It should be noted these rates do not differentiate between terrigenous and biogenic sedimentation. Carbon rich, biologically derived sediment is only a small percentage of total sediment settling on the ocean floor,1 60 Mt year −1 (Hedges and Keil 1995;Smith et al. 2015) compared with~13,500-19,100 Mt year −1 of fluvial sediment flux to the oceans calculated by Milliman and Meade (1983) and Milliman and Farnsworth (2011). As such, we treat sedimentation as being generally terrigenous in origin. Rates were subsequently converted to cm year −1 and were also transformed logarithmically to emphasize deep sea sedimentation, as this is not apparent outside of logarithmic space.
The k-nearest neighbor (k-NN) algorithm used in this study has been detailed in great technical depth in Lee et al. (2019). To briefly summarize, k-NN uses parametrically, not geospatially, nearest observed data points to calculate probable values in an area with no data. The parametric distance is calculated using predictor grids which are known or estimated properties about the water column and seafloor (i.e., water depth, distance from a river mouth, etc.) that are known globally. Predictor grids are generated from previously published research and open databases, for instance the 1/12°global HYCOM+NCODA Ocean Reanalysis (https://hycom.org/ publications/acknowledgements/ocean-reanalysis-data) and NASA's MODIS Aqua mission (https://modis.gsfc.nasa.gov).
To select the most relevant predictors, tenfold validation is used. Tenfold validation withholds a random 10% of all observations, with the remaining 90% used to predict. This is repeated until all points have been withheld and used in prediction. During feature selection, the prediction evaluates each predictor grid individually and then cross validates to define the predictive skill of each individual grid. This is then compared with uniform random noise grids. Only the best predictors are used to define the final prediction and error; grids with errors higher than random noise grids are discarded.
The k-NN method is simpler than other available predictive methods (random forest, for example), as the only hyperparameter to manipulate within the algorithm is the number of nearest neighbors used in the prediction of each point (k). Generally, with fewer neighbors selected, individual data points begin to overinfluence the predictions; if the k value is too high, oversmoothing occurs, wherein predictions begin to reflect the mean of all observed data (Zhang 2016). As follows, several predictions must be run to establish a reasonable k value that minimizes standard deviation and error while maximizing the validation R 2 value. For this prediction, five nearest neighbors (k = 5) were used.

Results
The highest sedimentation rates (~3-8 cm year −1 ) occur proximal to river outlets that drain large basins with high mountainous areas that are paired with wide, shallow continental shelves, e.g., the Amazon Delta, the Huang He Delta, and the Mississippi River Delta ( Fig. 2; DOI: https://doi.org/ 10.26022/IEDA/329769). The exception to this pattern is Southeast Asia and nearby islands, collectively referred to herein as Oceania, which displays the highest intensity of sedimentation over the largest area without the large drainage basins mentioned previously. Generally, shelf zones around all continents and islands display vertical sedimentation between approximately 0.1 and 1 cm year −1 (Fig. 2).
Rates in the deep ocean steadily decline by orders of magnitude towards deeper basins and away from subaerial land masses. The Pacific and Indian oceans contain the lowest Antarctic Boldt et al. 2013;Isla et al. 2002;Masqué et al. 2002.
Pacific (includes western North and South America) Alexander and Lee 2009;Alexander and Venherm 2003;Berger and Killingley 1982;Botwe et al. 2017;Cochran and Krishnaswami 1980;Hartmann et al. 1976;Ingall and Cappellen 1990;Koide et al. 1972;Lao et al. 1992;McMurtry 1981;Mitchell 1998;Müller and Suess 1979;Nittrouer et al. 1984;Stevenson and Cheng 1972;Thorbjarnarson et al. 1986;Wheatcroft and Sommerfield 2005. Depth/Elevation (km) 5 . . Rates increase slightly along the Mid-Atlantic Ridge and at the conjunction of the Cocos, Nazca, and Pacific tectonic plates, but remain functionally zero (1 0 −4 cm year −1 ). The authors theorize the increase is a result of both localized sedimentation resulting from tectonic processes and associated elevated bathymetry, but may also be artifacts from the prediction process.
Coefficient of determination for the validation portion of the model is high (R 2 = 0.89; Fig. 3). Standard deviation of the prediction is lowest in the deep oceans and along most coastlines; deviation increases along portions of the islands of Southeast Asia, a section of the northernmost Atlantic Ocean, and across the Arctic Ocean (Fig. 4). The highest correlated predictor grids include river mouth total suspended solids (TSS), dissolved organic carbon, wave direction, mean decadal sea salinity, and megafauna biomass, of which river mouth TSS was ranked highest (Table 2).

Discussion
The predictive map of Fig. 2 is very much in line with the work of Milliman and Farnsworth (2011), which quantifies fluvial sediment discharge into the oceans from all regions, and Milliman and Syvitski (1992), which illustrates the relationship between drainage basin area, maximum elevation, and sediment flux from rivers to the oceans. Milliman and Farnsworth (2011) identify the "Austral-Asian Rivers" region, i.e., Northern Australia, the islands of Southeast Asia, and Southeast Asia proper, which we refer to as Oceania, as discharging the most fluvial sediment into the oceans (12,500 Mt year −1 ), followed distantly by the Amazon region (1600 Mt year −1 ). Milliman and Syvitski (1992) also acknowledge the Oceania region yields the highest sediment loads, followed by the fluvial systems draining the Himalayas. In our prediction, the Oceania region is predicted to generate substantial vertical sedimentation over the largest area. The Amazon, Mississippi, and Ganges-Brahmaputra Deltas, among others, generate a similar magnitude of vertical sedimentation; however, the area of influence is limited compared with Oceania. Our prediction also highlights the Caribbean and the Northeastern USA, as locations of high vertical sedimentation. It should be noted that the Caribbean and Northeastern USA are not identified as contributing significant fluvial sediment to the oceans, according to Milliman and Farnsworth (2011); yet there is evidence of increased fluvial sediment discharge as a result of the twentieth century deforestation and urban development of the Caribbean islands (Alonso-Hernandez et al. 2006). Additionally, there are data indicating significant input of terrigenous sediment into the Caribbean originating from the Amazon and Orinoco Rivers of Brazil, especially during periods of high sea level (Bowles and Fleischer 1985). Thus, this carbonate rich area may be aggrading vertically with input from sediment carried from some distance away. As for the Northeastern USA, several rivers, including the Potomac and Hudson, discharge sediment originating from the northeastern Appalachian Mountains (Thompson 1939).
In Milliman and Farnsworth (2011), the third highest sediment discharging region is the west coast of North America. In our prediction, this area does not appear to generate significant vertical sedimentation when compared with areas with wide, shallow margins, described previously. This may be attributed to the relatively steep bathymetric gradient of western North America, as a steep continental shelf may negate vertical sedimentation potential from a high TSS fluvial source, such as the Columbia River. First, it appears that in order for sediment to considerably accumulate vertically, depth and the gradient towards deeper bathymetry must be relatively shallow. Second, the high amount of fresh sediment settling onto these margins leaves these regions susceptible to subaqueous landslides, which may retard overall upslope accumulation; the topic of slope stability is further addressed in the next section. Finally, Milliman and Syvitski (1992) remark that the steep gradient and proximity to sediment source of these small mountainous rivers equates to larger bedload, which is not usually included in sediment discharge values.
Coastal zones that experience high sedimentation are more susceptible to develop slope stability issues, as the weight of the newly added sediment increases shear stress which can lead to slope failure and trigger powerful downslope flows in already unstable area even with a very low gradient slope (Coleman and Garrison 1977;Clare et al. 2016;Maloney et al. 2018Maloney et al. , 2020. Additionally, newly deposited sediment lacks the consolidated cohesive strength of older sediment, compounding the problem of an already unstable seafloor (Obelcz et al. 2017). This may account for the lack of predicted sedimentation on the west coast of North and South America, and the steep gradient along the continental shelf allows for the formation of turbidity currents which quickly move sediment from the shelf edge to abyssal depths. We have not developed any predictors related to turbidity flows, so the GPSM model cannot account for this process, which resuspends and transports sediment to deeper waters. With regard to wider, shallower coastal shelfs, e.g., the Gulf of Mexico or the Eastern USA, the concern is focused on areas where dredging occurs, leaving behind steep walls of sediment with the potential for collapse, for instance in the Gulf of Mexico (Robichaux et al. 2020). Identification of these high sedimentation areas is vital in that important underwater infrastructure can be protected against destructive flows only if the threat is recognized. While the majority of our prediction is substantiated by real world data and documented patterns of fluvial sediment export and migration, some elevated vertical sedimentation rates are predicted in areas with no documented significant sediment input. For example, southcentral Australia is not considered to be a significant source of sediment to the oceans (Milliman and Farnsworth 2011). Yet, St. Vincent Gulf, Spencer Gulf, and Venus Bay are all highlighted as areas of significant vertical accumulation. Unfortunately, without field work and isotope geochronology to confirm the sedimentation rates in these areas, it remains unknown if the model is correct or if GPSM is simply projecting high sedimentation values based on commonalities with other high sediment yield bays in other parts of the world. Nonetheless, it is partially the purpose of GPSM to highlight areas of interest that have had little prior study. Finally, the amount of sediment delivered to coastal zones via rivers is generally agreed to be declining due to natural and anthropogenic factors (Fan et al. 2006;Bentley et al. 2016;Bergillos et al. 2016;Maloney et al. 2018). As new, more recent sedimentation data is generated, a data driven prediction such as ours will change.

Conclusion
Presented herein is the first map of global sedimentation rates that provides a well-founded view of global, benthic sedimentation patterns with quantitative uncertainties. However, this is not a final prediction; the model is ever-evolving as new data on sedimentation rates is generated or discovered and added to the dataset. The principle results from this prediction are: 1. East Asia and Oceania are associated with the highest quantity of vertical sedimentation over the largest region. These results are in line with the fluvial TSS flux to the oceans in Milliman and Farnsworth (2011) and Milliman and Syvitski (1992). 2. The deep oceans aggrade vertically at a rate that is functionally zero at the yearly time scale, increasing one order of magnitude at the Mid-Atlantic Ridge and at the conjunction of the Pacific, Cocos, and Nazca tectonic plates. 3. The GPSM prediction confirms that the extent of vertical sedimentation appears dependent on drainage basin area, maximum elevation within the basin, and secondarily on bathymetric gradient and depth (Milliman and Syvitski 1992). 4. Identification of areas with substantial vertical sedimentation allows for the recognition of unstable, slope failure prone regions, as the two are linked. This is of special importance in areas with sensitive infrastructure on or buried in the sea floor such as offshore oil platforms and pipelines.
These results are only the first steps towards machine learning-based predictions of global sediment dynamics and characteristics. The methods used herein are currently being applied to several aspects of benthic sediment characterization in the oceans, including grain size, composition, and mass accumulation, where there exist available real-world observations for use in GPSM. Acknowledgments The authors would like to extend gratitude to Dr. Clark Alexander and Dr. Samuel Bentley for providing data sets for use in this manuscript, as well as Dr. Jeff Obelcz and Taylor Lee of the Naval Research Laboratory for input during the modeling phase of the project.
Funding information Giancarlo A. Restreppo is a postdoctoral associate funded by the National Academies of Sciences, Engineering, and Medicine's NRC Research Associateship.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.