Frontiers of Earth Science

, Volume 13, Issue 3, pp 628–640 | Cite as

An unsupervised learning approach to study synchroneity of past events in the South China Sea

  • Kevin C. TseEmail author
  • Hon-Chim Chiu
  • Man-Yin Tsang
  • Yiliang Li
  • Edmund Y. Lam
Research Article


Unsupervised machine learning methods were applied on multivariate geophysical and geochemical datasets of ocean floor sediment cores collected from the South China Sea. The well-preserved and continuous core samples comprising high resolution Cenozoic sediment records enable scientists to carry out paleoenvironment studies in detail. Bayesian age-depth chronological models constructed from biostratigraphic control points for the drilling sites are applied on cluster boundaries generated from two popular unsupervised learning methods: K-means and random forest. The unsupervised learning methods experimented have produced compact and unambiguous clusters from the datasets, indicating that previously unknown data patterns can be revealed when all variables from the datasets are taken into account simultaneously. A study of synchroneity of past events represented by the cluster boundaries across geographically separated ocean drilling sites is achieved through converting the fixed depths of cluster boundaries into chronological ranges represented by Gaussian density plots which are then compared with known past events in the region. A Gaussian density peak at around 7.2 Ma has been identified from results of all three sites and it is suggested to coincide with the initiation of the East Asian monsoon. Contrary to traditional statistical approach, a priori assumptions are not required for unsupervised learning, and the clustering results serve as a novel data-driven proxy for studying the complex and dynamic processes of the paleoenvironment surrounding the ocean sediment. This work serves as a pioneering approach to extract valuable information of regional events and opens up a systematic and objective way to study the vast global ocean sediment datasets.


machine learning ocean sediments unsupervised classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Alley R B, Mayewski P A, Sowers T, Stuiver M, Taylor K C, Clark P U (1997). Holocene climatic instability: a prominent, widespread event 8200 yr ago. Geology, 25(6): 483–507Google Scholar
  2. An Z (2000). The history and variability of the East Asian paleomonsoon climate. Quat Sci Rev, 19(1): 171–187Google Scholar
  3. Benaouda D, Wadge G, Whitmarsh R B, Rothwell R G, MacLeod C (1999). Inferring the lithology of borehole rocks by applying neural network classifiers to downhole logs: an example from the ocean drilling program. Geophys J Int, 136(2): 477–491Google Scholar
  4. Bennett K D, Fuller J L (2002). Determining the age of the Mid-Holocene Tsuga canadensis (hemlock) decline, eastern North America. Holocene, 12(4): 421–429Google Scholar
  5. Birks H J B (1989). Holocene isochrone maps and patterns of tree-spreading in the British isles. J Biogeogr, 16(6): 503–540Google Scholar
  6. Breiman L (1984). Classification and Regression Trees. New York: Chapman & HallGoogle Scholar
  7. Breiman L (2001). Random forests. Mach Learn, 45: 5–32Google Scholar
  8. Chauhan S, Ruhaak W, Khan F, Enzmann F, Mielke P, Kersten M, Sass I. (2016). Processing of rock core microtomogrpahy images: using seven different machine learning algorithms. Comput Geosci, 86: 120–128Google Scholar
  9. Cheeseman P, Self M, Kelly J, Taylor W, Freeman D, Stutz J (1988). Bayesian classification. In: Proceedings of the Seventh AAAI National Conference on Artificial Intelligence. AAAI’88. New York: AAAI Press, 607–611Google Scholar
  10. Cracknell M J, Reading A M, McNeill A W (2014). Mapping geology and volcanic hosted massive sulfide alteration in the Hellyer-Mt Charter region, Tasmania, using random forest and self-organising maps. Aust J Earth Sci, 61: 287–304Google Scholar
  11. Davis M H A (1984). Piecewise-deterministic markov processes: a general class of non-diffusion stochastic models (with discussion). J R Stat Soc B, 46: 353–388Google Scholar
  12. Exp. 349 scientists. (2014). IODP expedition 349 preliminary report, South China Sea tectonics-opening of the South China Sea and its implications for southeast asian tectonics, climates and deep mantle processes since the late mesozoic. Initial reports. New York: IODPGoogle Scholar
  13. Goetz J N, Brenning A, Petschko H, Leopold P (2015). Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Comput Geosci, 81: 1–11Google Scholar
  14. Haslett J, Parnell A (2008). A simple monotone process with application to radiocarbon-dated depth chronologies. J R Stat Soc Ser C Appl Stat, 57(4): 399–418Google Scholar
  15. Hazen R (2014). Data-driven abductive discovery in mineralogy. Am Mineral, 99: 2165–2170Google Scholar
  16. Hennig C (2016). What are the true clusters? Pattern Recognit Lett, 64: 53–62Google Scholar
  17. Insua T L, Hamel L, Moran K, Anderson L M, Webster J M (2015). Advanced classification of carbonate sediments based on physical properties. Sedimentology, 62: 590–606Google Scholar
  18. Isabella R, Backman J, Fornaciari E. (2006). A review of calcareous nannofossil astrobiochronology encompassing the past 25 million years. Quat Sci Rev, 25: 3113–3137Google Scholar
  19. Jain A K (2010). Data clustering: 50 years beyond k-means. Pattern Recognit Lett, 31: 651–666Google Scholar
  20. Jorgensen B (1987). Exponential dispersion models. J R Stat Soc B, 49: 127–162Google Scholar
  21. Kabacoff R I (2015). R in Action-Data analysis and graphics with R. San Jose: ManningGoogle Scholar
  22. Kohonen T (2001). Self-Organizing Maps. New York: Springer-VertagGoogle Scholar
  23. Lary D J, Alavi A H, Gandomi A H, Walker L W. (2016). Machine learning in geosciences and remote sensing. Geoscience Frontiers, 7: 3–10Google Scholar
  24. Li Q, Jian Z, Li B (2004). Oligocene-miocene planktonic foraminiferal biostratigraphy, site 1148, northern South China Sea. In: Proceedings of ODP Sci. Results. New York: IODP, 184(1): 1–26Google Scholar
  25. Liao T W (2005). Clustering of time series data—a survey. Pattern Recognit, 38: 1857–1874Google Scholar
  26. Liu Y, Weisberg R H (2005). Patterns of ocean current variability on the west florida shelf using the selforganizing map. J Geophys Res Oceans, 110(C6): 0148–0227Google Scholar
  27. Liu Y, Weisberg R H (2011). A review of self-organizing map applications in meteorology and oceanography. In: Mwasiagi J I, ed. Self-Organizing Maps—Applications and Novel Algorithm Design. Rijeka, Croatia: Intech, 253–272Google Scholar
  28. MacQueen J (1967). Some methods for classification and analysis of multivariate observations. In: Le Cam L M, Neyman J, eds. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. San Francisco: University of California, 281–297Google Scholar
  29. Murphy K P (2012). Machine Learning A Probabilistic Perspective. New York: The MIT PressGoogle Scholar
  30. Nakamori T (2001). Global carbonate accumulation rates from cretaceous to present and their implications for the carbon cycle model. Isl Arc, 10(1): 1–8Google Scholar
  31. Nathan S, Leckie R (2003). Miocene planktonic foraminiferal biostratigraphy of sites 1143 and 1146, ODP leg 184, South China Sea. Proc ODP, Sci Results, 184(1): 1–43Google Scholar
  32. Parnell A, Haslett J, Allen J, Buck C, Huntley B (2008). A flexible approach to assessing synchroneity of past events using bayesian reconstructions of sedimentation history. Quat Sci Rev, 27(19): 1872–1885Google Scholar
  33. Pavlidou E, van der Meijde M, van der Werff H, Hecker C (2016). Finding a needle by removing the haystack: a spatio-temporal normalization method for geophysical data. Comput Geosci, 90: 78–86Google Scholar
  34. Penn B S (2005). Using self-organizing maps to visualize high-dimensional data. Comput Geosci, 31(5): 531–544Google Scholar
  35. Pham B T, Bui D T, Prakash I (2017). Landslide susceptibility assessment using bagging ensemble based alternating decision trees, logistic regression and J48 decision trees methods: a comparative study. London. Geotech Geol Eng, 35(6): 2597–2611Google Scholar
  36. Pham B T, Tien Bui D, Pham H V, Le H Q, Prakash I, Dholakia M B (2016). Landslide hazard assessment using random subspace fuzzy rules based classifier ensemble and probability analysis of rainfall data: a case study at Mu Cang Chai District, Yen Bai Province (Viet Nam). J In Soc of Remote Sensing, 45(4): 673–683Google Scholar
  37. Philip Chen C L, Zhang C Y (2014). Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci, 275: 314–347Google Scholar
  38. Romary T, Rivoirard J, Deraisme J (2015). Unsupervised classification of multivariate geostatistical data: two algorithms. Comput Geosci, 85: 96–103Google Scholar
  39. Sammon J W (1969). A nonlinear mapping for data structure analysis. IEEE Trans Comput, 18: 401–409Google Scholar
  40. Singh A, Yadav A, Rana A (2013). K-means with three different distance metrics. Int J Comput Appl, 67(10): 13–17Google Scholar
  41. Srivastava A, Nemani R, Steinhaeuser K (2017). Large-Scale Machine Learning in the Earth Sciences. New York: Chapman and Hall/CRCGoogle Scholar
  42. Tse K C, Chiu H C, Tsang M Y, Li Y, Lam E Y (2019). Unsupervised learning on scientific ocean drilling datasets from the South China Sea. Front Earth Sci, 13(1): 180–190Google Scholar
  43. Wagstaff K L (2012). Proceedings of the 29th international conference on machine learning. San Francisco: California Institute of TechnologyGoogle Scholar
  44. Wang P, Blum P, et al. (2000). 2000 Proceedings of the Ocean Drilling Program, Initial Reports, Vol. 184. Initial Reports. New York: ODP PressGoogle Scholar
  45. Wang P, Li Q (2009). The South China Sea-paleoceanography and sedimentology. In: The South China Sea-Paleoceanography and Sedimentology. Berlin: SpringerGoogle Scholar
  46. Way M J, Scargle J D, Ali K M, Srivastava A N (2012). Advances in Machine Learning and Data Mining for Astronomy. New York: CRC PressGoogle Scholar
  47. Whitman J M, Davies T A (1979). Cenozoic oceanic sedimentation rates: How good are the data? Mar Geol, 30(34): 269–284Google Scholar
  48. Williams R (2011). Earth Science: New Methods and Studies. London: Apple Academic PressGoogle Scholar
  49. Wolfe P J (2013). Making sense of big data. Proc Natl Acad Sci USA, 110(45): 18031–18032Google Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Kevin C. Tse
    • 1
    Email author
  • Hon-Chim Chiu
    • 2
  • Man-Yin Tsang
    • 3
  • Yiliang Li
    • 1
  • Edmund Y. Lam
    • 4
  1. 1.Department of Earth SciencesThe University of Hong KongHong KongChina
  2. 2.Department of Geography and Centre for Geo-computation StudiesHong Kong Baptist UniversityKowloon Tong, Hong KongChina
  3. 3.Department of Earth SciencesUniversity of TorontoTorontoCanada
  4. 4.Department of Electrical and Electronic EngineeringThe University of Hong KongHong KongChina

Personalised recommendations