Skip to main content
Log in

Lagrangian analysis by clustering

  • Published:
Ocean Dynamics Aims and scope Submit manuscript

Abstract

We propose a new method for obtaining average velocities and eddy diffusivities from Lagrangian data. Rather than grouping the drifter-derived velocities in geographical bins, we group them by nearest-neighbor distance using a clustering algorithm. This yields sets with approximately the same number of observations, covering unequal areas. A major advantage is that, because the number of observations is the same for the clusters, the statistical accuracy is more uniform than with geographical bins. We illustrate the technique using synthetic data from a stochastic model, employing a realistic mean flow. The latter represents the surface currents in the Nordic Seas and is strongly inhomogeneous in space. We use the clustering algorithm to extract the mean velocities and diffusivities and compare the results with the corresponding quantities from the stochastic model. We perform a similar comparison with the means and diffusivities obtained with geographical bins. Clustering is more successful at capturing the mean flow and improves convergence in the eddy diffusivity estimates. We discuss both the advantages and shortcomings of the new method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Poulain et al. (1996) found T L = 1 − 3 days here, while Andersson et al. (submitted for publication) estimated T L = 1.1 days. LaCasce (2005) found that the Eulerian integral time is 1 to 2 days, which implies an equal or shorter Lagrangian time.

  2. The dimensions are listed (degrees longitude × degrees latitude). With (2° × 1°), the bins are close to square in the southern part of the domain but are more rectangular in the north.

References

  • Bauer S, Swenson MS, Griffa A, Mariano AJ, Owens K (1998) Eddy mean flow decomposition and eddy diffusivity estimates in the tropical Pacific Ocean. J Geophys Res 103(C13):30855–30871

    Article  Google Scholar 

  • Bauer S, Swenson MS, Griffa A (2002) Eddy mean flow decomposition and eddy diffusivity estimates in the tropical Pacific Ocean: 2. Results. J Geophys Res 107(C10):3154

    Article  Google Scholar 

  • Brink KH, Breadsley RC, Paduan J, Limeburner R, Caruso M, Sires JG (2000) A view of the 1993–1994 California Current based on surface drifters, floats, and remotely sensed data. J Geophys Res 105(C4):8575–8604

    Article  Google Scholar 

  • Colin de Verdiere A (1983) Lagrangian eddy statistics from surface drifters in the eastern North Atlantic. J Mar Res 41:375–398

    Article  Google Scholar 

  • Davis RE (1991) Observing the general circulation with floats. Deep-Sea Res Suppl 38:S531–S571

    Article  Google Scholar 

  • Davis RE (1998) Preliminary results from directly measuring mid-depth circulation in the Tropical and South Pacific. J Geophys Res 103:24619–24639

    Article  Google Scholar 

  • Falco P, Griffa A, Poulain P-M, Zambianchi E (2000) Transport properties in the Adriatic Sea as deduced from drifter data. J Phys Oceanogr 30:2055–2071

    Article  Google Scholar 

  • Fratantoni DM (2001) North Atlantic surface circulation during the 1990’s observed with satellite-tracked drifters. J Geophys Res 106(C10):22067–22093

    Article  Google Scholar 

  • Garraffo Z, Griffa A, Mariano AJ, Chassignet EP (2001) Lagrangian data in a high-resolution numerical simulation of the North Atlantic II. On the pseudo-Eulerian averaging of Lagrangian data. J Mar Syst 29:177–200

    Article  Google Scholar 

  • Griffa A (1996) Applications of stochastic particle models to oceanographical problems. In: Adler R, Muller P, Rozovskii B (eds) Stochastic modelling in physical oceanography. Birkhauser, Boston, pp 114–140

    Google Scholar 

  • Jakobsen PK, Ribergaard MH, Quadfasel D, Schmith T, Hughes CW (2003) Near-surface circulation in the northern North Atlantic as inferred from Lagrangian drifters: variability from the mesoscale to interannual. J Geophys Res 108(C5):3251

    Article  Google Scholar 

  • Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892

    Article  Google Scholar 

  • Koszalka I, LaCasce JH, Orvik KA (2009) Relative dispersion in the Nordic Seas. J Mar Res 67:411–433

    Article  Google Scholar 

  • LaCasce J (2005) Statistics of low frequency currents over the western Norwegian shelf and slope I: current meters. Ocean Model 55:213–221

    Google Scholar 

  • LaCasce J (2008) Statistics from Lagrangian observations. Prog Oceanogr 77(1):1–29

    Article  Google Scholar 

  • LaCasce J, Engedahl H (2005) Statistics of low frequency currents over the western Norwegian shelf and slope II: model. Ocean Model 55:222–237

    Google Scholar 

  • LaCasce JH (2000) Floats and f/H. J Mar Res 58:61–95

    Article  Google Scholar 

  • Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    Article  Google Scholar 

  • Lumpkin R (2003) Decomposition of surface drifter observations in the Atlantic Ocean. Geophys Res Lett 30(14):1753

    Article  Google Scholar 

  • Lumpkin R, Flament P (2001) Lagrangian statistics in the central North Pacific. J Mar Syst 29:141–155

    Article  Google Scholar 

  • Lumpkin R, Garraffo Z (2005) Evaluating the decomposition of Tropical Atlantic drifter observations. J Phys Oceanogr 22:1403–1415

    Google Scholar 

  • Lumpkin R, Treguier A-M, Speer K (2002) Lagrangian eddy scales in the Northern Atlantic Ocean. J Phys Oceanogr 32:2425–2440

    Google Scholar 

  • MacKay DJC (2003) Information theory, inference, and learning algorithms. Cambridge University Press, Cambridge

    Google Scholar 

  • Mariano A, Ryan E (2007) Lagrangian analysis and prediction of coastal and ocean dynamics (LAPCOD review). In Griffa A, Kirwan AD, Mariano AJ, Ozgokmen T, Rossby T (eds) Lagrangian analysis and prediction of coastal and ocean dynamics, Chapter 13. Cambridge University Press, Cambridge, pp 423–467

    Chapter  Google Scholar 

  • Orvik KA, Niiler P (2002) Major pathways of Atlantic Water in the northern North Atlantic and Nordic Seas towards Arctic. Geophys Res Lett 29(19):1896

    Article  Google Scholar 

  • Owens WB (1991) A statistical description of the mean circulation and eddy variability in the northwestern North Atlantic using SOFAR floats. Prog Oceanogr 28:257–303

    Article  Google Scholar 

  • Poulain P-M (2001) Adriatic Sea surface circulation as derived from drifter data between 1990 and 1999. J Mar Syst 29:3–32

    Article  Google Scholar 

  • Poulain P-M, Warn-Varnas A, Niiler PP (1996) Near-surface circulation of the Nordic Seas as measured by Lagrangian drifters. J Geophys Res 101:18237–18258

    Article  Google Scholar 

  • Rossby HT, Riser SC, Mariano AJ (1983) The western North Atlantic—a Lagrangian viewpoint. In: Robinson AR (ed) Eddies in marine science. Springer, Heidelberg, pp 66–91

    Google Scholar 

  • Rupolo V (2007) Observing turbulence regimes and Lagrangian dispersal properties in the oceans. In Griffa A, Kirwan AD, Mariano AJ, Ozgokmen T, Rossby T (eds) Lagrangian analysis and prediction of coastal and ocean dynamics, Chapter 9. Cambridge University Press, Cambridge, pp 231–274

    Chapter  Google Scholar 

  • Saetre R (1999) Features of the central Norwegian shelf circulation. Cont Shelf Res 19:1809–1831

    Article  Google Scholar 

  • Sallee JB, Speer K, Morrow R, Lumpkin R (2008) An estimate of Lagrangian eddy statistics and diffusion in the mixed layer of the Southern Ocean. J Mar Res 66:441–463

    Article  Google Scholar 

  • Skagseth Ø, Orvik KA (2002) Identifying fluctuations in the Norwegian Atlantic Slope Current by means of empirical orthogonal functions. Cont Shelf Res 22:547–563

    Article  Google Scholar 

  • Swenson MS, Niiler PP (1996) Statistical analysis of the surface circulation of the California Current. J Geophys Res 101(C10):22631–22645

    Article  Google Scholar 

  • Taylor GI (1921) Diffusion by continuous movements. Proc Lond Math Soc 20:196–212

    Article  Google Scholar 

  • Thompson A, Heywood KJ, Thorpe SE, Renner AH, Trasvina A (2009) Surface circulation at the tip of the Antarctic Peninsula from drifters. J Phys Oceanogr 39:3–25

    Article  Google Scholar 

  • Veneziani M, Griffa A, Reynolds AM, Mariano AJ (2004) Oceanic turbulence and stochastic models from subsurface Lagrangian data for the Northwest Atlantic Ocean. J Phys Oceanogr 34:1884–1906

    Article  Google Scholar 

  • Zhurbas V, Oh IS (2003) Lateral diffusivity and Lagrangian scales in the Pacific Ocean as derived from drifter data. J Geophys Res 108(C5):3141

    Article  Google Scholar 

Download references

Acknowledgements

The work is part of the Poleward project, funded by by the Norwegian Research Council Norklima program (grant number 178559/S30). Details are found on http://www.iaoos.no/ and http://folk.uio.no/ingako/my_files/POLEWARD_WEBPAGE_MAIN.html.. Harald Engedahl provided the MIPOM velocities. We appreciate useful comments from two anonymous reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Inga Monika Koszalka.

Additional information

Responsible Editor: John Grue

Appendix: The clustering algorithm

Appendix: The clustering algorithm

We base our clustering procedure on a generalized version of the Llloyd’s (1982) algorithm for the problem described by Eq. 5. However, contrary to conventional applications of k-means (MacKay 2003), in our problem, the number of clusters k does not need to be guessed at, but it is deduced from the total amount of data to match the desired number of cluster members m. Hence, we have developed here a procedure to partition the data into clusters with the number of members being as close as possible to a prescribed value m. This heuristic numerical solution is possibly not an optimal one, but it performed well for the purpose of this study. The implementation is done with the MATLAB k-means toolbox, modified accordingly. The steps of the algorithm are as follows:

  • Choose the desired number of members in a cluster, m

  • Given the total number of independent observations n and m, compute the target number of clusters, k=n/m

  • Start k-means procedure (“batch phase”)

    • A random set of k clusters is randomly seeded

    • Assign each point to the nearest cluster center minimizing the squared Euclidean distance in geographical coordinates (Eq. 5)

    • Recompute the new cluster centers

    • The two previous steps continues until the convergence criterion is met (the assignment has not changed or maximum number of iterations is reached, set to be 200 here)

    • The four previous steps are repeated 100 times (for 100 initial seedings, or “replicates”) and the “best solution” (global minimum, that is, the lowest value of the sum of within-cluster distances, summed over all clusters) is the output

  • End k-means procedure

  • Clusters with the desired number of members are removed from consideration and stored, while the entire clustering procedure is repeated on the smaller data set. The process continues until all the data are grouped in clusters which satisfy m ∈ (m − 5, m + 5), or until maximum number of iterations, 400, is reached. The requirement was not met in some subsets, which considered typically clusters peripheral to the data-covered area. These were still included in the further analysis making the distribution curves in Fig. 4b differ from delta-functions.

Large number of iterations and the requirement of uniform splitting of the data makes the analysis computationally intensive. For that reason, we do not perform a check for a “local minimum” (in terms of Eq. 5) by a series of reassignments of the points between clusters. Nevertheless, we found that repeated runs of the entire procedure described above led merely to a slightly different arrangement of clusters, while the reported results from the Z-test (Fig. 5) changed only within ±2%.

The running time of the entire procedure was ca. 6 h on x86_64 GNU/Linux machine with 32 GB RAM.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koszalka, I.M., LaCasce, J.H. Lagrangian analysis by clustering. Ocean Dynamics 60, 957–972 (2010). https://doi.org/10.1007/s10236-010-0306-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10236-010-0306-2

Keywords

Navigation