Abstract
Unsupervised machine learning methods were applied on multivariate geophysical and geochemical datasets of ocean floor sediment cores collected from the South China Sea. The well-preserved and continuous core samples comprising high resolution Cenozoic sediment records enable scientists to carry out paleoenvironment studies in detail. Bayesian age-depth chronological models constructed from biostratigraphic control points for the drilling sites are applied on cluster boundaries generated from two popular unsupervised learning methods: K-means and random forest. The unsupervised learning methods experimented have produced compact and unambiguous clusters from the datasets, indicating that previously unknown data patterns can be revealed when all variables from the datasets are taken into account simultaneously. A study of synchroneity of past events represented by the cluster boundaries across geographically separated ocean drilling sites is achieved through converting the fixed depths of cluster boundaries into chronological ranges represented by Gaussian density plots which are then compared with known past events in the region. A Gaussian density peak at around 7.2 Ma has been identified from results of all three sites and it is suggested to coincide with the initiation of the East Asian monsoon. Contrary to traditional statistical approach, a priori assumptions are not required for unsupervised learning, and the clustering results serve as a novel data-driven proxy for studying the complex and dynamic processes of the paleoenvironment surrounding the ocean sediment. This work serves as a pioneering approach to extract valuable information of regional events and opens up a systematic and objective way to study the vast global ocean sediment datasets.
Similar content being viewed by others
References
Alley R B, Mayewski P A, Sowers T, Stuiver M, Taylor K C, Clark P U (1997). Holocene climatic instability: a prominent, widespread event 8200 yr ago. Geology, 25(6): 483–507
An Z (2000). The history and variability of the East Asian paleomonsoon climate. Quat Sci Rev, 19(1): 171–187
Benaouda D, Wadge G, Whitmarsh R B, Rothwell R G, MacLeod C (1999). Inferring the lithology of borehole rocks by applying neural network classifiers to downhole logs: an example from the ocean drilling program. Geophys J Int, 136(2): 477–491
Bennett K D, Fuller J L (2002). Determining the age of the Mid-Holocene Tsuga canadensis (hemlock) decline, eastern North America. Holocene, 12(4): 421–429
Birks H J B (1989). Holocene isochrone maps and patterns of tree-spreading in the British isles. J Biogeogr, 16(6): 503–540
Breiman L (1984). Classification and Regression Trees. New York: Chapman & Hall
Breiman L (2001). Random forests. Mach Learn, 45: 5–32
Chauhan S, Ruhaak W, Khan F, Enzmann F, Mielke P, Kersten M, Sass I. (2016). Processing of rock core microtomogrpahy images: using seven different machine learning algorithms. Comput Geosci, 86: 120–128
Cheeseman P, Self M, Kelly J, Taylor W, Freeman D, Stutz J (1988). Bayesian classification. In: Proceedings of the Seventh AAAI National Conference on Artificial Intelligence. AAAI’88. New York: AAAI Press, 607–611
Cracknell M J, Reading A M, McNeill A W (2014). Mapping geology and volcanic hosted massive sulfide alteration in the Hellyer-Mt Charter region, Tasmania, using random forest and self-organising maps. Aust J Earth Sci, 61: 287–304
Davis M H A (1984). Piecewise-deterministic markov processes: a general class of non-diffusion stochastic models (with discussion). J R Stat Soc B, 46: 353–388
Exp. 349 scientists. (2014). IODP expedition 349 preliminary report, South China Sea tectonics-opening of the South China Sea and its implications for southeast asian tectonics, climates and deep mantle processes since the late mesozoic. Initial reports. New York: IODP
Goetz J N, Brenning A, Petschko H, Leopold P (2015). Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Comput Geosci, 81: 1–11
Haslett J, Parnell A (2008). A simple monotone process with application to radiocarbon-dated depth chronologies. J R Stat Soc Ser C Appl Stat, 57(4): 399–418
Hazen R (2014). Data-driven abductive discovery in mineralogy. Am Mineral, 99: 2165–2170
Hennig C (2016). What are the true clusters? Pattern Recognit Lett, 64: 53–62
Insua T L, Hamel L, Moran K, Anderson L M, Webster J M (2015). Advanced classification of carbonate sediments based on physical properties. Sedimentology, 62: 590–606
Isabella R, Backman J, Fornaciari E. (2006). A review of calcareous nannofossil astrobiochronology encompassing the past 25 million years. Quat Sci Rev, 25: 3113–3137
Jain A K (2010). Data clustering: 50 years beyond k-means. Pattern Recognit Lett, 31: 651–666
Jorgensen B (1987). Exponential dispersion models. J R Stat Soc B, 49: 127–162
Kabacoff R I (2015). R in Action-Data analysis and graphics with R. San Jose: Manning
Kohonen T (2001). Self-Organizing Maps. New York: Springer-Vertag
Lary D J, Alavi A H, Gandomi A H, Walker L W. (2016). Machine learning in geosciences and remote sensing. Geoscience Frontiers, 7: 3–10
Li Q, Jian Z, Li B (2004). Oligocene-miocene planktonic foraminiferal biostratigraphy, site 1148, northern South China Sea. In: Proceedings of ODP Sci. Results. New York: IODP, 184(1): 1–26
Liao T W (2005). Clustering of time series data—a survey. Pattern Recognit, 38: 1857–1874
Liu Y, Weisberg R H (2005). Patterns of ocean current variability on the west florida shelf using the selforganizing map. J Geophys Res Oceans, 110(C6): 0148–0227
Liu Y, Weisberg R H (2011). A review of self-organizing map applications in meteorology and oceanography. In: Mwasiagi J I, ed. Self-Organizing Maps—Applications and Novel Algorithm Design. Rijeka, Croatia: Intech, 253–272
MacQueen J (1967). Some methods for classification and analysis of multivariate observations. In: Le Cam L M, Neyman J, eds. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. San Francisco: University of California, 281–297
Murphy K P (2012). Machine Learning A Probabilistic Perspective. New York: The MIT Press
Nakamori T (2001). Global carbonate accumulation rates from cretaceous to present and their implications for the carbon cycle model. Isl Arc, 10(1): 1–8
Nathan S, Leckie R (2003). Miocene planktonic foraminiferal biostratigraphy of sites 1143 and 1146, ODP leg 184, South China Sea. Proc ODP, Sci Results, 184(1): 1–43
Parnell A, Haslett J, Allen J, Buck C, Huntley B (2008). A flexible approach to assessing synchroneity of past events using bayesian reconstructions of sedimentation history. Quat Sci Rev, 27(19): 1872–1885
Pavlidou E, van der Meijde M, van der Werff H, Hecker C (2016). Finding a needle by removing the haystack: a spatio-temporal normalization method for geophysical data. Comput Geosci, 90: 78–86
Penn B S (2005). Using self-organizing maps to visualize high-dimensional data. Comput Geosci, 31(5): 531–544
Pham B T, Bui D T, Prakash I (2017). Landslide susceptibility assessment using bagging ensemble based alternating decision trees, logistic regression and J48 decision trees methods: a comparative study. London. Geotech Geol Eng, 35(6): 2597–2611
Pham B T, Tien Bui D, Pham H V, Le H Q, Prakash I, Dholakia M B (2016). Landslide hazard assessment using random subspace fuzzy rules based classifier ensemble and probability analysis of rainfall data: a case study at Mu Cang Chai District, Yen Bai Province (Viet Nam). J In Soc of Remote Sensing, 45(4): 673–683
Philip Chen C L, Zhang C Y (2014). Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci, 275: 314–347
Romary T, Rivoirard J, Deraisme J (2015). Unsupervised classification of multivariate geostatistical data: two algorithms. Comput Geosci, 85: 96–103
Sammon J W (1969). A nonlinear mapping for data structure analysis. IEEE Trans Comput, 18: 401–409
Singh A, Yadav A, Rana A (2013). K-means with three different distance metrics. Int J Comput Appl, 67(10): 13–17
Srivastava A, Nemani R, Steinhaeuser K (2017). Large-Scale Machine Learning in the Earth Sciences. New York: Chapman and Hall/CRC
Tse K C, Chiu H C, Tsang M Y, Li Y, Lam E Y (2019). Unsupervised learning on scientific ocean drilling datasets from the South China Sea. Front Earth Sci, 13(1): 180–190
Wagstaff K L (2012). Proceedings of the 29th international conference on machine learning. San Francisco: California Institute of Technology
Wang P, Blum P, et al. (2000). 2000 Proceedings of the Ocean Drilling Program, Initial Reports, Vol. 184. Initial Reports. New York: ODP Press
Wang P, Li Q (2009). The South China Sea-paleoceanography and sedimentology. In: The South China Sea-Paleoceanography and Sedimentology. Berlin: Springer
Way M J, Scargle J D, Ali K M, Srivastava A N (2012). Advances in Machine Learning and Data Mining for Astronomy. New York: CRC Press
Whitman J M, Davies T A (1979). Cenozoic oceanic sedimentation rates: How good are the data? Mar Geol, 30(34): 269–284
Williams R (2011). Earth Science: New Methods and Studies. London: Apple Academic Press
Wolfe P J (2013). Making sense of big data. Proc Natl Acad Sci USA, 110(45): 18031–18032
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tse, K.C., Chiu, HC., Tsang, MY. et al. An unsupervised learning approach to study synchroneity of past events in the South China Sea. Front. Earth Sci. 13, 628–640 (2019). https://doi.org/10.1007/s11707-019-0748-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11707-019-0748-x