Abstract
We propose a new method for obtaining average velocities and eddy diffusivities from Lagrangian data. Rather than grouping the drifter-derived velocities in geographical bins, we group them by nearest-neighbor distance using a clustering algorithm. This yields sets with approximately the same number of observations, covering unequal areas. A major advantage is that, because the number of observations is the same for the clusters, the statistical accuracy is more uniform than with geographical bins. We illustrate the technique using synthetic data from a stochastic model, employing a realistic mean flow. The latter represents the surface currents in the Nordic Seas and is strongly inhomogeneous in space. We use the clustering algorithm to extract the mean velocities and diffusivities and compare the results with the corresponding quantities from the stochastic model. We perform a similar comparison with the means and diffusivities obtained with geographical bins. Clustering is more successful at capturing the mean flow and improves convergence in the eddy diffusivity estimates. We discuss both the advantages and shortcomings of the new method.
Similar content being viewed by others
Notes
The dimensions are listed (degrees longitude × degrees latitude). With (2° × 1°), the bins are close to square in the southern part of the domain but are more rectangular in the north.
References
Bauer S, Swenson MS, Griffa A, Mariano AJ, Owens K (1998) Eddy mean flow decomposition and eddy diffusivity estimates in the tropical Pacific Ocean. J Geophys Res 103(C13):30855–30871
Bauer S, Swenson MS, Griffa A (2002) Eddy mean flow decomposition and eddy diffusivity estimates in the tropical Pacific Ocean: 2. Results. J Geophys Res 107(C10):3154
Brink KH, Breadsley RC, Paduan J, Limeburner R, Caruso M, Sires JG (2000) A view of the 1993–1994 California Current based on surface drifters, floats, and remotely sensed data. J Geophys Res 105(C4):8575–8604
Colin de Verdiere A (1983) Lagrangian eddy statistics from surface drifters in the eastern North Atlantic. J Mar Res 41:375–398
Davis RE (1991) Observing the general circulation with floats. Deep-Sea Res Suppl 38:S531–S571
Davis RE (1998) Preliminary results from directly measuring mid-depth circulation in the Tropical and South Pacific. J Geophys Res 103:24619–24639
Falco P, Griffa A, Poulain P-M, Zambianchi E (2000) Transport properties in the Adriatic Sea as deduced from drifter data. J Phys Oceanogr 30:2055–2071
Fratantoni DM (2001) North Atlantic surface circulation during the 1990’s observed with satellite-tracked drifters. J Geophys Res 106(C10):22067–22093
Garraffo Z, Griffa A, Mariano AJ, Chassignet EP (2001) Lagrangian data in a high-resolution numerical simulation of the North Atlantic II. On the pseudo-Eulerian averaging of Lagrangian data. J Mar Syst 29:177–200
Griffa A (1996) Applications of stochastic particle models to oceanographical problems. In: Adler R, Muller P, Rozovskii B (eds) Stochastic modelling in physical oceanography. Birkhauser, Boston, pp 114–140
Jakobsen PK, Ribergaard MH, Quadfasel D, Schmith T, Hughes CW (2003) Near-surface circulation in the northern North Atlantic as inferred from Lagrangian drifters: variability from the mesoscale to interannual. J Geophys Res 108(C5):3251
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Koszalka I, LaCasce JH, Orvik KA (2009) Relative dispersion in the Nordic Seas. J Mar Res 67:411–433
LaCasce J (2005) Statistics of low frequency currents over the western Norwegian shelf and slope I: current meters. Ocean Model 55:213–221
LaCasce J (2008) Statistics from Lagrangian observations. Prog Oceanogr 77(1):1–29
LaCasce J, Engedahl H (2005) Statistics of low frequency currents over the western Norwegian shelf and slope II: model. Ocean Model 55:222–237
LaCasce JH (2000) Floats and f/H. J Mar Res 58:61–95
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Lumpkin R (2003) Decomposition of surface drifter observations in the Atlantic Ocean. Geophys Res Lett 30(14):1753
Lumpkin R, Flament P (2001) Lagrangian statistics in the central North Pacific. J Mar Syst 29:141–155
Lumpkin R, Garraffo Z (2005) Evaluating the decomposition of Tropical Atlantic drifter observations. J Phys Oceanogr 22:1403–1415
Lumpkin R, Treguier A-M, Speer K (2002) Lagrangian eddy scales in the Northern Atlantic Ocean. J Phys Oceanogr 32:2425–2440
MacKay DJC (2003) Information theory, inference, and learning algorithms. Cambridge University Press, Cambridge
Mariano A, Ryan E (2007) Lagrangian analysis and prediction of coastal and ocean dynamics (LAPCOD review). In Griffa A, Kirwan AD, Mariano AJ, Ozgokmen T, Rossby T (eds) Lagrangian analysis and prediction of coastal and ocean dynamics, Chapter 13. Cambridge University Press, Cambridge, pp 423–467
Orvik KA, Niiler P (2002) Major pathways of Atlantic Water in the northern North Atlantic and Nordic Seas towards Arctic. Geophys Res Lett 29(19):1896
Owens WB (1991) A statistical description of the mean circulation and eddy variability in the northwestern North Atlantic using SOFAR floats. Prog Oceanogr 28:257–303
Poulain P-M (2001) Adriatic Sea surface circulation as derived from drifter data between 1990 and 1999. J Mar Syst 29:3–32
Poulain P-M, Warn-Varnas A, Niiler PP (1996) Near-surface circulation of the Nordic Seas as measured by Lagrangian drifters. J Geophys Res 101:18237–18258
Rossby HT, Riser SC, Mariano AJ (1983) The western North Atlantic—a Lagrangian viewpoint. In: Robinson AR (ed) Eddies in marine science. Springer, Heidelberg, pp 66–91
Rupolo V (2007) Observing turbulence regimes and Lagrangian dispersal properties in the oceans. In Griffa A, Kirwan AD, Mariano AJ, Ozgokmen T, Rossby T (eds) Lagrangian analysis and prediction of coastal and ocean dynamics, Chapter 9. Cambridge University Press, Cambridge, pp 231–274
Saetre R (1999) Features of the central Norwegian shelf circulation. Cont Shelf Res 19:1809–1831
Sallee JB, Speer K, Morrow R, Lumpkin R (2008) An estimate of Lagrangian eddy statistics and diffusion in the mixed layer of the Southern Ocean. J Mar Res 66:441–463
Skagseth Ø, Orvik KA (2002) Identifying fluctuations in the Norwegian Atlantic Slope Current by means of empirical orthogonal functions. Cont Shelf Res 22:547–563
Swenson MS, Niiler PP (1996) Statistical analysis of the surface circulation of the California Current. J Geophys Res 101(C10):22631–22645
Taylor GI (1921) Diffusion by continuous movements. Proc Lond Math Soc 20:196–212
Thompson A, Heywood KJ, Thorpe SE, Renner AH, Trasvina A (2009) Surface circulation at the tip of the Antarctic Peninsula from drifters. J Phys Oceanogr 39:3–25
Veneziani M, Griffa A, Reynolds AM, Mariano AJ (2004) Oceanic turbulence and stochastic models from subsurface Lagrangian data for the Northwest Atlantic Ocean. J Phys Oceanogr 34:1884–1906
Zhurbas V, Oh IS (2003) Lateral diffusivity and Lagrangian scales in the Pacific Ocean as derived from drifter data. J Geophys Res 108(C5):3141
Acknowledgements
The work is part of the Poleward project, funded by by the Norwegian Research Council Norklima program (grant number 178559/S30). Details are found on http://www.iaoos.no/ and http://folk.uio.no/ingako/my_files/POLEWARD_WEBPAGE_MAIN.html.. Harald Engedahl provided the MIPOM velocities. We appreciate useful comments from two anonymous reviewers.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible Editor: John Grue
Appendix: The clustering algorithm
Appendix: The clustering algorithm
We base our clustering procedure on a generalized version of the Llloyd’s (1982) algorithm for the problem described by Eq. 5. However, contrary to conventional applications of k-means (MacKay 2003), in our problem, the number of clusters k does not need to be guessed at, but it is deduced from the total amount of data to match the desired number of cluster members m. Hence, we have developed here a procedure to partition the data into clusters with the number of members being as close as possible to a prescribed value m. This heuristic numerical solution is possibly not an optimal one, but it performed well for the purpose of this study. The implementation is done with the MATLAB k-means toolbox, modified accordingly. The steps of the algorithm are as follows:
-
Choose the desired number of members in a cluster, m
-
Given the total number of independent observations n and m, compute the target number of clusters, k=n/m
-
Start k-means procedure (“batch phase”)
-
A random set of k clusters is randomly seeded
-
Assign each point to the nearest cluster center minimizing the squared Euclidean distance in geographical coordinates (Eq. 5)
-
Recompute the new cluster centers
-
The two previous steps continues until the convergence criterion is met (the assignment has not changed or maximum number of iterations is reached, set to be 200 here)
-
The four previous steps are repeated 100 times (for 100 initial seedings, or “replicates”) and the “best solution” (global minimum, that is, the lowest value of the sum of within-cluster distances, summed over all clusters) is the output
-
-
End k-means procedure
-
Clusters with the desired number of members are removed from consideration and stored, while the entire clustering procedure is repeated on the smaller data set. The process continues until all the data are grouped in clusters which satisfy m ∈ (m − 5, m + 5), or until maximum number of iterations, 400, is reached. The requirement was not met in some subsets, which considered typically clusters peripheral to the data-covered area. These were still included in the further analysis making the distribution curves in Fig. 4b differ from delta-functions.
Large number of iterations and the requirement of uniform splitting of the data makes the analysis computationally intensive. For that reason, we do not perform a check for a “local minimum” (in terms of Eq. 5) by a series of reassignments of the points between clusters. Nevertheless, we found that repeated runs of the entire procedure described above led merely to a slightly different arrangement of clusters, while the reported results from the Z-test (Fig. 5) changed only within ±2%.
The running time of the entire procedure was ca. 6 h on x86_64 GNU/Linux machine with 32 GB RAM.
Rights and permissions
About this article
Cite this article
Koszalka, I.M., LaCasce, J.H. Lagrangian analysis by clustering. Ocean Dynamics 60, 957–972 (2010). https://doi.org/10.1007/s10236-010-0306-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10236-010-0306-2