Advertisement

Automatic water mixing event identification in the Koljö fjord observatory data

  • Markus Götz
  • Mikhail Kononets
  • Christian Bodenstein
  • Morris Riedel
  • Matthias Book
  • Olafur Petur Palsson
Applications
  • 18 Downloads

Abstract

This study addresses the task of automatically identifying water mixing events in the multivariate time series of salinity, temperature and dissolved oxygen provided by the Koljö fjord observatory. The observatory is used to test new underwater sensory technology and to monitor water quality with respect to hypoxia and oxygenation in the fjord and has been collecting data since April 2011. The fjord water properties change, manifesting as peaks or drops of dissolved oxygen, salinity and temperature, when affected by inflows of new water originating from the open sea or by rivers connected to the fjord system. An acute state of oxygen depletion can harm wildlife and the ecosystem permanently. The major challenge for the analysis is that the water property changes are marked by highly varying peak strength and correlation between the signals. The proposed data-driven analysis method extends existing univariate outlier detection approaches, based on clustering techniques, to identify the water mixing events. It incorporates three major steps: 1. smoothing of the input data, to counter noise, 2. individual outlier detection within the separate variables, 3. clustering of the results using the DBSCAN clustering algorithm to determine the anomalous events. The proposed approach is able to detect the water mixing events with a \(F{\textit{1}}\)-measure of 0.885, a precision of 0.931—that is 93.1% of all events have been correctly detected—and a recall of 0.843–84.3% of events that should have been found actually also have been. Using the proposed method, the oceanographers can be informed automatically about the status of the fjord without manual interaction or physical presence at the experiment site.

Keywords

Multivariate time series analysis Koljö fjord observatory Water mixing event detection Clustering DBSCAN 

Notes

Acknowledgements

The installation of the Koljö fjord cabled observatory was carried out by the University of Gothenburg in collaboration with MARUM, University of Bremen, Germany, and funded by the European Commission projects ESONET-NoE (contract number 036851), HYPOX (Grant agreement number 226213) and EMSO (Grant agreement number 211816). This work is also supported by Aanderaa Data Instruments AS providing the Doppler Current Profiler instruments, other material and financial support to run the Koljö fjord observatory.

Compliance with ethical standards

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

References

  1. 1.
    Aanderaa Data Instruments AS: Aanderaa Recording Doppler Current Meter 600. http://www.aanderaa.com/media/pdfs/RDCP-600.pdf/ (2016a). [Online; Accessed 07 Oct 2016; 10:23 CEST]
  2. 2.
    Aanderaa Data Instruments AS: Aanderaa Seaguard II DCP Doppler Current Profiler. http://www.aanderaa.com/media/pdfs/seaguardii-dcp.pdf/ (2016b). [Online; Accessed 07 Oct 2016; 10:33 CEST]
  3. 3.
    Aanderaa Data Instruments AS: Aanderaa Seaguard String System. http://www.aanderaa.com/media/pdfs/seaguard-string-system.pdf/ (2016c). [Online; Accessed 07 Oct 2016; 10:34 CEST]
  4. 4.
    Andersson, L., Rydberg, L.: Trends in nutrient and oxygen conditions within the Kattegat: effects of local nutrient supply. Estuar. Coast. Shelf Sci. 26(5), 559–579 (1988)CrossRefGoogle Scholar
  5. 5.
    Arce, G., McLoughlin, M.: Theoretical analysis of the max/median filter. IEEE Trans. Acoust. Speech Signal Process. 35(1), 60–69 (1987)CrossRefGoogle Scholar
  6. 6.
    Atamanchuk, D., Tengberg, A., Aleynik, D., Fietzek, P., Shitashima, K., Lichtschlag, A., Hall, P.O., Stahl, H.: Detection of CO2 leakage from a simulated sub-seabed storage site using three different types of CO2 sensors. Int. J. Greenh. Gas Control 38, 121–134 (2015)CrossRefGoogle Scholar
  7. 7.
    Bagnall, A.J., Janacek, G.J.: Clustering time series from ARMA models with clipped data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 49–58 (2004)Google Scholar
  8. 8.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATHGoogle Scholar
  9. 9.
    Diepenbroek, M., Grobe, H., Reinke, M., Schindler, U., Schlitzer, R., Sieger, R., Wefer, G.: PANGAEA—an information system for environmental sciences. Comput. Geosci. 28(10), 1201–1210 (2002)CrossRefGoogle Scholar
  10. 10.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Knowl. Discov. Data Min. 96, 226–231 (1996)Google Scholar
  11. 11.
    Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., et al: Open mpi: goals, concept, and design of a next generation mpi implementation. In: European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting, pp. 97–104. Springer (2004)Google Scholar
  12. 12.
    Gariel, M., Srivastava, A.N., Feron, E.: Trajectory clustering and an application to airspace monitoring. IEEE Trans. Intell. Transp. Syst. 12(4), 1511–1524 (2011)CrossRefGoogle Scholar
  13. 13.
    Götz, M., Bodenstein, C., Riedel, M.: HPDBSCAN: highly parallel DBSCAN. In: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, ACM, p. 2 (2015)Google Scholar
  14. 14.
    Goutte, C., Toft, P., Rostrup, E., Nielsen, F.Å., Hansen, L.K.: On clustering fMRI time series. NeuroImage 9(3), 298–310 (1999)CrossRefGoogle Scholar
  15. 15.
    Götz, M., Kononets, M.: Auxiliary material for the Koljöfjord observatory water mixing event detection using DBSCAN. http://hdl.handle.net/11304/8e3d1c07-96b6-4ab7-b4aa-f273ac8cbf74/ (2016). [Online; Accessed 17 Nov 2016; 16:07 CET]
  16. 16.
    Hallac, D., Vare, S., Boyd, S., Leskovec, J.: Toeplitz inverse covariance-based clustering of multivariate time series data. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 215–223 (2017)Google Scholar
  17. 17.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)MATHGoogle Scholar
  18. 18.
    Hansson, D., Stigebrandt, A., Liljebladh, B.: Modelling the Orust fjord system on the Swedish west coast. J. Mar. Syst. 113, 29–41 (2013)CrossRefGoogle Scholar
  19. 19.
    Himberg, J., Hyvärinen, A., Esposito, F.: Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage 22(3), 1214–1222 (2004)CrossRefGoogle Scholar
  20. 20.
    Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)CrossRefMATHGoogle Scholar
  21. 21.
    Jiang, D., Pei, J., Zhang, A.: Dhc: a density-based hierarchical clustering method for time series gene expression data. In: Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings, pp. 393–400. IEEE (2003)Google Scholar
  22. 22.
    Johnston, F., Boyland, J., Meadows, M., Shale, E.: Some properties of a simple moving average when applied to forecasting a time series. J. Oper. Res. Soc. 50(12), 1267–1271 (1999)CrossRefMATHGoogle Scholar
  23. 23.
    Klise, K.A., McKenna, S.A.: Water quality change detection: multivariate algorithms. In: Defense and Security Symposium, International Society for Optics and Photonics, p. 62030J (2006)Google Scholar
  24. 24.
    Koljöfjord Observatory Koljöfjord Observatory Data. http://koljofjord.cmb.gu.se/data/ (2016a). [Online; Accessed 19 June 2016; 15:07 CEST]
  25. 25.
    Koljöfjord Observatory PANGAEA Data Repository, Koljöfjord entries. https://pangaea.de/search?q=KOLJOEFJORD (2016b). [Online; Accessed 19 June 2016; 15:08 CEST]
  26. 26.
    Kononets, M., Götz, M.: Koljöfjord Observatory Preprocessed Data And Water Mixing Events. http://hdl.handle.net/11304/f76da1d9-c61e-4250-beca-94d1b2803e77/ (2016). [Online; Accessed 07 Oct 2016; 10:15 CEST]
  27. 27.
    Kut, A., Birant, D.: Spatio-temporal outlier detection in large databases. CIT J. Comput. Inf. Technol. 14(4), 291–297 (2006)CrossRefGoogle Scholar
  28. 28.
    Liao, T.W.: Clustering of time series data—a survey. Pattern Recognit. 38(11), 1857–1874 (2005)CrossRefMATHGoogle Scholar
  29. 29.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 14, pp. 281–297 (1967)Google Scholar
  30. 30.
    Madsen, H.: Time series analysis. CRC Press, Boca Raton (2007)MATHGoogle Scholar
  31. 31.
    Götz, M.: PANGAEA Github Repository. https://github.com/Markus-Goetz/pangaea (2016). [Online; Accessed 13 Jan 2016; 14:04 CET]
  32. 32.
    McKinney, W.: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media, Inc., Newton (2012)Google Scholar
  33. 33.
    Murray, R., Haxton, T., McKenna, S., Hart, D., Klise, K., Koch, M., Vugrin, E., Martin, S., Wilson, M., Cruze, V., et al.: Water quality event detection systems for drinking water contamination warning systems—development, testing, and application of canary. EPAI600IR-lOI036, US (2010)Google Scholar
  34. 34.
    Nordberg, K., Filipsson, H.L., Gustafsson, M., Harland, R., Roos, P.: Climate, hydrographic variations and marine benthic hypoxia in Koljö Fjord, Sweden. J. Sea Res. 46(3), 187–200 (2001)CrossRefGoogle Scholar
  35. 35.
    Pavlidis, N.G., Tasoulis, D.K., Plagianakos, V.P., Vrahatis, M.N.: Computational intelligence methods for financial time series modeling. Int. J. Bifurc. Chaos 16(07), 2053–2062 (2006)MathSciNetCrossRefMATHGoogle Scholar
  36. 36.
    Perelman, L., Arad, J., Housh, M., Ostfeld, A.: Event detection in water distribution systems from multivariate water quality time series. Environ. Sci. Technol. 46(15), 8212–8219 (2012)CrossRefGoogle Scholar
  37. 37.
    Powers, D.: Evaluation: From Precision, Recall and F Factor to ROC, Informedness, Markedness and Correaltion. School of Informatics and Engineering, Flinders, Bedford Park (2007)Google Scholar
  38. 38.
    Swedish Meteorological and Hydrological Institute: Marina miljöövervakningsdata. http://www.smhi.se/klimatdata/oceanografi/havsmiljodata/marina-miljoovervakningsdata (2016). [Online; Accessed 19 Sept 2016; 16:29 CEST]
  39. 39.
    University of Gothenburg: Sven Lovén centrum för marin infrastruktur—Väderstation Kristineberg. http://www.weather.loven.gu.se/kristineberg/ (2016). [Online; Accessed 19 Sept 2016; 16:55 CEST]
  40. 40.
    Whitle, P.: Hypothesis Testing in Time Series Analysis, vol. 4. Almqvist & Wiksells, Stockholm (1951)Google Scholar
  41. 41.
    Zhao, H., Hou, D., Huang, P., Zhang, G.: Water quality event detection in drinking water network. Water Air Soil Pollut 225(11), 1–15 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Juelich Supercomputing CenterResearch Center JuelichJülichGermany
  2. 2.Department of Marine ScienceUniversity of GothenburgGöteborgSweden
  3. 3.Mechanical Engineering and Computer Science, Faculty for Industrial EngineeringUniversity of IcelandReykjavíkIceland

Personalised recommendations