Lagrangian analysis by clustering

Koszalka, Inga Monika; LaCasce, Joseph H.

doi:10.1007/s10236-010-0306-2

Lagrangian analysis by clustering

Published: 09 June 2010

Volume 60, pages 957–972, (2010)
Cite this article

Ocean Dynamics Aims and scope Submit manuscript

Inga Monika Koszalka¹ &
Joseph H. LaCasce¹

477 Accesses
35 Citations
Explore all metrics

Abstract

We propose a new method for obtaining average velocities and eddy diffusivities from Lagrangian data. Rather than grouping the drifter-derived velocities in geographical bins, we group them by nearest-neighbor distance using a clustering algorithm. This yields sets with approximately the same number of observations, covering unequal areas. A major advantage is that, because the number of observations is the same for the clusters, the statistical accuracy is more uniform than with geographical bins. We illustrate the technique using synthetic data from a stochastic model, employing a realistic mean flow. The latter represents the surface currents in the Nordic Seas and is strongly inhomogeneous in space. We use the clustering algorithm to extract the mean velocities and diffusivities and compare the results with the corresponding quantities from the stochastic model. We perform a similar comparison with the means and diffusivities obtained with geographical bins. Clustering is more successful at capturing the mean flow and improves convergence in the eddy diffusivity estimates. We discuss both the advantages and shortcomings of the new method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Soft clustering of GPS velocities from a homogeneous permanent network in Turkey

Article 30 January 2019

The Hierarchical Spectral Merger Algorithm: A New Time Series Clustering Procedure

Article 12 April 2018

Identifying dominant flow features from very-sparse Lagrangian data: a multiscale recurrence network-based approach

Article Open access 21 September 2023

Notes

Poulain et al. (1996) found T _L = 1 − 3 days here, while Andersson et al. (submitted for publication) estimated T _L = 1.1 days. LaCasce (2005) found that the Eulerian integral time is 1 to 2 days, which implies an equal or shorter Lagrangian time.
The dimensions are listed (degrees longitude × degrees latitude). With (2° × 1°), the bins are close to square in the southern part of the domain but are more rectangular in the north.

References

Bauer S, Swenson MS, Griffa A, Mariano AJ, Owens K (1998) Eddy mean flow decomposition and eddy diffusivity estimates in the tropical Pacific Ocean. J Geophys Res 103(C13):30855–30871
Article Google Scholar
Bauer S, Swenson MS, Griffa A (2002) Eddy mean flow decomposition and eddy diffusivity estimates in the tropical Pacific Ocean: 2. Results. J Geophys Res 107(C10):3154
Article Google Scholar
Brink KH, Breadsley RC, Paduan J, Limeburner R, Caruso M, Sires JG (2000) A view of the 1993–1994 California Current based on surface drifters, floats, and remotely sensed data. J Geophys Res 105(C4):8575–8604
Article Google Scholar
Colin de Verdiere A (1983) Lagrangian eddy statistics from surface drifters in the eastern North Atlantic. J Mar Res 41:375–398
Article Google Scholar
Davis RE (1991) Observing the general circulation with floats. Deep-Sea Res Suppl 38:S531–S571
Article Google Scholar
Davis RE (1998) Preliminary results from directly measuring mid-depth circulation in the Tropical and South Pacific. J Geophys Res 103:24619–24639
Article Google Scholar
Falco P, Griffa A, Poulain P-M, Zambianchi E (2000) Transport properties in the Adriatic Sea as deduced from drifter data. J Phys Oceanogr 30:2055–2071
Article Google Scholar
Fratantoni DM (2001) North Atlantic surface circulation during the 1990’s observed with satellite-tracked drifters. J Geophys Res 106(C10):22067–22093
Article Google Scholar
Garraffo Z, Griffa A, Mariano AJ, Chassignet EP (2001) Lagrangian data in a high-resolution numerical simulation of the North Atlantic II. On the pseudo-Eulerian averaging of Lagrangian data. J Mar Syst 29:177–200
Article Google Scholar
Griffa A (1996) Applications of stochastic particle models to oceanographical problems. In: Adler R, Muller P, Rozovskii B (eds) Stochastic modelling in physical oceanography. Birkhauser, Boston, pp 114–140
Google Scholar
Jakobsen PK, Ribergaard MH, Quadfasel D, Schmith T, Hughes CW (2003) Near-surface circulation in the northern North Atlantic as inferred from Lagrangian drifters: variability from the mesoscale to interannual. J Geophys Res 108(C5):3251
Article Google Scholar
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Article Google Scholar
Koszalka I, LaCasce JH, Orvik KA (2009) Relative dispersion in the Nordic Seas. J Mar Res 67:411–433
Article Google Scholar
LaCasce J (2005) Statistics of low frequency currents over the western Norwegian shelf and slope I: current meters. Ocean Model 55:213–221
Google Scholar
LaCasce J (2008) Statistics from Lagrangian observations. Prog Oceanogr 77(1):1–29
Article Google Scholar
LaCasce J, Engedahl H (2005) Statistics of low frequency currents over the western Norwegian shelf and slope II: model. Ocean Model 55:222–237
Google Scholar
LaCasce JH (2000) Floats and f/H. J Mar Res 58:61–95
Article Google Scholar
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Article Google Scholar
Lumpkin R (2003) Decomposition of surface drifter observations in the Atlantic Ocean. Geophys Res Lett 30(14):1753
Article Google Scholar
Lumpkin R, Flament P (2001) Lagrangian statistics in the central North Pacific. J Mar Syst 29:141–155
Article Google Scholar
Lumpkin R, Garraffo Z (2005) Evaluating the decomposition of Tropical Atlantic drifter observations. J Phys Oceanogr 22:1403–1415
Google Scholar
Lumpkin R, Treguier A-M, Speer K (2002) Lagrangian eddy scales in the Northern Atlantic Ocean. J Phys Oceanogr 32:2425–2440
Google Scholar
MacKay DJC (2003) Information theory, inference, and learning algorithms. Cambridge University Press, Cambridge
Google Scholar
Mariano A, Ryan E (2007) Lagrangian analysis and prediction of coastal and ocean dynamics (LAPCOD review). In Griffa A, Kirwan AD, Mariano AJ, Ozgokmen T, Rossby T (eds) Lagrangian analysis and prediction of coastal and ocean dynamics, Chapter 13. Cambridge University Press, Cambridge, pp 423–467
Chapter Google Scholar
Orvik KA, Niiler P (2002) Major pathways of Atlantic Water in the northern North Atlantic and Nordic Seas towards Arctic. Geophys Res Lett 29(19):1896
Article Google Scholar
Owens WB (1991) A statistical description of the mean circulation and eddy variability in the northwestern North Atlantic using SOFAR floats. Prog Oceanogr 28:257–303
Article Google Scholar
Poulain P-M (2001) Adriatic Sea surface circulation as derived from drifter data between 1990 and 1999. J Mar Syst 29:3–32
Article Google Scholar
Poulain P-M, Warn-Varnas A, Niiler PP (1996) Near-surface circulation of the Nordic Seas as measured by Lagrangian drifters. J Geophys Res 101:18237–18258
Article Google Scholar
Rossby HT, Riser SC, Mariano AJ (1983) The western North Atlantic—a Lagrangian viewpoint. In: Robinson AR (ed) Eddies in marine science. Springer, Heidelberg, pp 66–91
Google Scholar
Rupolo V (2007) Observing turbulence regimes and Lagrangian dispersal properties in the oceans. In Griffa A, Kirwan AD, Mariano AJ, Ozgokmen T, Rossby T (eds) Lagrangian analysis and prediction of coastal and ocean dynamics, Chapter 9. Cambridge University Press, Cambridge, pp 231–274
Chapter Google Scholar
Saetre R (1999) Features of the central Norwegian shelf circulation. Cont Shelf Res 19:1809–1831
Article Google Scholar
Sallee JB, Speer K, Morrow R, Lumpkin R (2008) An estimate of Lagrangian eddy statistics and diffusion in the mixed layer of the Southern Ocean. J Mar Res 66:441–463
Article Google Scholar
Skagseth Ø, Orvik KA (2002) Identifying fluctuations in the Norwegian Atlantic Slope Current by means of empirical orthogonal functions. Cont Shelf Res 22:547–563
Article Google Scholar
Swenson MS, Niiler PP (1996) Statistical analysis of the surface circulation of the California Current. J Geophys Res 101(C10):22631–22645
Article Google Scholar
Taylor GI (1921) Diffusion by continuous movements. Proc Lond Math Soc 20:196–212
Article Google Scholar
Thompson A, Heywood KJ, Thorpe SE, Renner AH, Trasvina A (2009) Surface circulation at the tip of the Antarctic Peninsula from drifters. J Phys Oceanogr 39:3–25
Article Google Scholar
Veneziani M, Griffa A, Reynolds AM, Mariano AJ (2004) Oceanic turbulence and stochastic models from subsurface Lagrangian data for the Northwest Atlantic Ocean. J Phys Oceanogr 34:1884–1906
Article Google Scholar
Zhurbas V, Oh IS (2003) Lateral diffusivity and Lagrangian scales in the Pacific Ocean as derived from drifter data. J Geophys Res 108(C5):3141
Article Google Scholar

Download references

Acknowledgements

The work is part of the Poleward project, funded by by the Norwegian Research Council Norklima program (grant number 178559/S30). Details are found on http://www.iaoos.no/ and http://folk.uio.no/ingako/my_files/POLEWARD_WEBPAGE_MAIN.html.. Harald Engedahl provided the MIPOM velocities. We appreciate useful comments from two anonymous reviewers.

Author information

Authors and Affiliations

Department of Geosciences, University of Oslo, P.O. Box 1022, Blindern, 0315, Oslo, Norway
Inga Monika Koszalka & Joseph H. LaCasce

Authors

Inga Monika Koszalka
View author publications
You can also search for this author in PubMed Google Scholar
Joseph H. LaCasce
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Inga Monika Koszalka.

Additional information

Responsible Editor: John Grue

Appendix: The clustering algorithm

We base our clustering procedure on a generalized version of the Llloyd’s (1982) algorithm for the problem described by Eq. 5. However, contrary to conventional applications of k-means (MacKay 2003), in our problem, the number of clusters k does not need to be guessed at, but it is deduced from the total amount of data to match the desired number of cluster members m. Hence, we have developed here a procedure to partition the data into clusters with the number of members being as close as possible to a prescribed value m. This heuristic numerical solution is possibly not an optimal one, but it performed well for the purpose of this study. The implementation is done with the MATLAB k-means toolbox, modified accordingly. The steps of the algorithm are as follows:

Choose the desired number of members in a cluster, m
Given the total number of independent observations n and m, compute the target number of clusters, k=n/m
Start k-means procedure (“batch phase”)
- A random set of k clusters is randomly seeded
- Assign each point to the nearest cluster center minimizing the squared Euclidean distance in geographical coordinates (Eq. 5)
- Recompute the new cluster centers
- The two previous steps continues until the convergence criterion is met (the assignment has not changed or maximum number of iterations is reached, set to be 200 here)
- The four previous steps are repeated 100 times (for 100 initial seedings, or “replicates”) and the “best solution” (global minimum, that is, the lowest value of the sum of within-cluster distances, summed over all clusters) is the output
End k-means procedure
Clusters with the desired number of members are removed from consideration and stored, while the entire clustering procedure is repeated on the smaller data set. The process continues until all the data are grouped in clusters which satisfy m ∈ (m − 5, m + 5), or until maximum number of iterations, 400, is reached. The requirement was not met in some subsets, which considered typically clusters peripheral to the data-covered area. These were still included in the further analysis making the distribution curves in Fig. 4b differ from delta-functions.

Large number of iterations and the requirement of uniform splitting of the data makes the analysis computationally intensive. For that reason, we do not perform a check for a “local minimum” (in terms of Eq. 5) by a series of reassignments of the points between clusters. Nevertheless, we found that repeated runs of the entire procedure described above led merely to a slightly different arrangement of clusters, while the reported results from the Z-test (Fig. 5) changed only within ±2%.

The running time of the entire procedure was ca. 6 h on x86_64 GNU/Linux machine with 32 GB RAM.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koszalka, I.M., LaCasce, J.H. Lagrangian analysis by clustering. Ocean Dynamics 60, 957–972 (2010). https://doi.org/10.1007/s10236-010-0306-2

Download citation

Received: 15 February 2010
Accepted: 21 May 2010
Published: 09 June 2010
Issue Date: August 2010
DOI: https://doi.org/10.1007/s10236-010-0306-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lagrangian analysis by clustering

Abstract

Access this article

Similar content being viewed by others

Soft clustering of GPS velocities from a homogeneous permanent network in Turkey

The Hierarchical Spectral Merger Algorithm: A New Time Series Clustering Procedure

Identifying dominant flow features from very-sparse Lagrangian data: a multiscale recurrence network-based approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: The clustering algorithm

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lagrangian analysis by clustering

Abstract

Access this article

Similar content being viewed by others

Soft clustering of GPS velocities from a homogeneous permanent network in Turkey

The Hierarchical Spectral Merger Algorithm: A New Time Series Clustering Procedure

Identifying dominant flow features from very-sparse Lagrangian data: a multiscale recurrence network-based approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: The clustering algorithm

Appendix: The clustering algorithm

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation