1 Introduction

Coastal meteorology focuses on the meteorological phenomena that are directly caused—or significantly influenced—by the rapid changes in atmospheric conditions that occur at the land-sea transition zone (National Research Council 1994). As it is estimated that more than 600 million people (about 10% of the world population) today live in coastal areas less than 10 m.a.s.l. (UN 2017), a thorough understanding of weather conditions in coastal environments is crucial to correctly evaluate the critical parameters that affect human health and wellness, e.g., air quality and thermal comfort.

The sea breeze regime (SBR) is a relevant topic in the context of coastal meteorology, as it characterizes the mesoscale circulation, accounting for many spatially and temporally nested phenomena (Miller et al. 2003), and with cascading impacts on the local environment (Masselink 1998). The onset of the SBR is triggered by the temperature difference between the colder sea surface and the warmer land, giving rise to a pressure gradient that is particularly intense during sunny, summer days. In addition to the thermal forcing, the SBR is also affected by other factors, such as the coastline curvature (Steele et al. 2013), the proximity of reliefs and orography (He et al. 2020), and, above all, the synoptic-scale weather conditions (Di Bernardino et al. 2021a). The separation of the breeze regime from the synoptic conditions is indeed one of the major problems encountered in SBR studies (Zhong and Takle 1993), whose investigation is made even harder in the urban environment of coastal cities, where the SBR also interacts with the urban heat island (UHI) effect (Bauer 2020). The UHI main characteristic are: (1) the local increase in temperature, especially at night and during summer, due to the presence of heat-absorbing materials such as asphalt and concrete, which modify the natural albedo and heat capacity of the surfaces, and (2) the limited evapotranspiration and soil permeability, due to the reduction of green areas. The occurrence, intensity, and inland penetration of the SBR can strongly affect local air quality in cities, as wind conditions are among the main factors responsible for the accumulation of pollutants near the ground. For instance, the ingress of cold and humid air into the inland in the mornings can determine the development of a thermal internal boundary layer, in which the contaminants produced at the ground level can be trapped (Wei et al. 2018). Similarly, the absence of the breeze regime, in conjunction with unfavourable synoptic weather conditions (i.e., wind calm, persistent atmospheric stability), can cause thermal inversions, with the consequent increase in pollutants concentration near the ground (Fortelli et al. 2016). On the other hand, during the night hours, the air in contact with the mainland cools more rapidly than that over the ocean, due to the higher heat capacity and consequent thermal inertia of the sea. This leads to a situation of local high pressure over the land, and low pressure above the sea, with a consequent displacement of air from the land to the sea, typically defined as land breeze regime. In this case, the wind can transport atmospheric pollutants out of the cities, improving the air quality.

Although the SBR has been widely studied in recent years from both the experimental (Cenedese et al. 2000; Puygrenier et al. 2005) and the numerical (Fu et al. 2021; Di Bernardino et al. 2021b) point of view, its identification only relies on subjective and site-specific criteria, impairing the objective separation of the pure sea-breeze contribution to the local circulation from the concurrent synoptic influence, which can disturb, limit or exacerbate the inland penetration of the breeze (Azorin-Molina et al. 2011). Only recently, researchers have attempted to isolate the pure sea breeze component by analysing high-resolution models and observations and developing less site-specific criteria and no-threshold methods (Cafaro et al. 2019).

The objective classification of synoptic weather patterns, and the subsequent isolation of the SBR from synoptic-scale phenomena, can be carried out by means of clustering techniques. In recent years, clustering algorithms have been widely used for meteorological and climatological studies, although the choice of the input parameters represents a critical constraint for successful algorithm setup (see the review by Huth et al. 2008 for details about clustering techniques and meteorological applications). Among these, thanks to its simplicity of implementation and computation efficiency, the k-means algorithm is widely used for the classification of environmental data—e.g., observations from meteorological stations (Li et al. 2020), evapotranspiration measurements (Ferreira et al. 2019), and air quality parameters (Adame et al. 2012)—as well as for wind energy resource assessments (Al-Shammari et al. 2016) and rainfall prediction (Geetha and Nasira 2014).

The principal aim of this study is the operative identification of the anemological patterns over the Tyrrhenian coast of central Italy, in the Lazio region, by applying the k-means clustering algorithm. The investigated domain also includes the city of Rome, which is the largest and most populous Italian city. Following the procedure proposed by Li et al. (2020), the clustering procedure only requires wind speed and direction data at the surface, measured at coastal stations. The minor modifications that have been introduced to better account for the region's peculiarities are described in Sect. 2.2. With respect to other clustering algorithms, this method offers the advantage of affordable data requirements, anemological observations being generally available with high temporal resolution and spatial coverage. In addition, the objective parameters used to guide the clustering procedure also provide valuable diagnostics to investigate the relative roles of ventilation and air recirculation in the region. The resulting classification, based on the analysis of the anemological data over the period 2014–2020, provides a representative picture of the synoptic and low-level circulation in the area of interest. Future applications envisage the characterization of the relationship between anemological patterns and air quality in the urban area of Rome and its surroundings and the assessment of the wind energy resource in the region. If sufficiently long time series are available (at least 30-year period, according to WMO 2017), this method will also allow to investigate the impact of climate change on the inter-annual variability of typical weather conditions (Ogaya and Peñuelas 2021; Pohl et al. 2021). Moreover, the procedure could be easily extended and applied to other coastal regions with similar conditions (e.g., shoreline orientation, breeze onset, and offset times).

The paper is organized as follows. Section 2 presents the study domain, the meteorological dataset, and the k-means clustering technique. The main results are described and discussed in Sect. 3, especially focusing on the seasonal and monthly occurrence of the identified clusters. Finally, Sect. 4 summarizes the conclusions of the study and possible future developments.

2 Methodology and dataset

2.1 Study area and meteorological data

The domain investigated in this study covers the coastal area of the Lazio region, central Italy, extending for about 100 km in the North-West/South-East direction. The area is generally flat near the coast and hilly in the inland, where four main orographic system are located, i.e., the Tolfa and Sabatini Mountains in the North and the Alban Hills and the Lepini Mountains in the South (see Fig. 1). The domain includes the city of Rome, the capital of Italy, and its metropolitan area, which together host about 4.3 million inhabitants, besides several smaller towns.

Fig. 1
figure 1

Map of the selected stations. Numbers refer to the Station ID reported in Table 1. The main geographical elements of the region are shown, while the red circle depicts the municipality of Rome

In the present study, seven in-situ weather stations are selected, belonging to the Regional Agency for the Development and Innovation of Agriculture of Lazio (ARSIAL, http://www.arsial.it/arsial/) meteorological network. The distribution of the meteorological sites is depicted in Fig. 1, and their topographic characteristics are summarized in Table 1. All the stations are located within a maximum distance of 30 km from the coast, where the effects of the sea breeze have been proved to be still relevant, also reaching the innermost Rome area (Colacino 1982; Di Bernardino et al. 2021c). Stations 01, 02, and 07 are equipped with Campbell instruments (Campbell Scientific Europe, Loughborough, United Kingdom), while the sensors of stations 03, 04, 05, and 06 are produced by Siap + Micros (Siap + Micros S.p.A., San Fior, Treviso, ITALY). All sensors comply with WMO (World Meteorological Organization) requirements, although detailed information about data uncertainty is currently unavailable.

Table 1 List of the surface weather stations considered in the present study

Although the selected datasets had been previously validated by ARSIAL, additional pre-processing and quality control checks have been carried out. First, data were visually inspected and screened to eliminate gross errors, to standardize the formats and units of measurement, and to assess the presence of missing data. In all the selected stations, the percentage of missing hourly averaged wind speed and direction data was found to be less than 1.7%, with the exception of stations 01 and 07, for which the missing data amounted to 5.9% and 4.8% of the total, respectively. Missing data appeared to be randomly distributed over time and not to affect the results of the study. Next, the time hourly averaged series were inspected for errors attributable to non-climatological signals, such as instrumental artefacts, changes in the position of the weather stations, and non-routine maintenance. As a result of such screening, only the hourly averaged wind speed and direction data covering the period 01/01/2014–31/12/2020 were retained for subsequent analysis.

2.2 K-means clustering

The k-means clustering algorithm is a method of vector quantization, developed by Wong and Hartigan (1979), which allows for the grouping of data within a given number of subsets, called clusters, which are defined following the seven objective parameters described below.

The technique requires two user-defined input parameters: (1) the number of clusters (k), into which the data will be divided, and (2) the initial (random) position of the centroid for each cluster. Once the centroids have been randomly initialized, the algorithm assigns each measurement to the closest cluster. Then, the positions of the cluster centroids are iteratively updated, according to the values of the objective parameters computed for cluster members, and optimized until convergence is achieved. Results are assumed to converge when the position of the centroids stabilizes, that is, when the SSE or inertia (i.e., the sum of the squared deviations of the computed variables from the correspondent cluster mean) is minimized. It should be emphasized that the k-means algorithm forcibly assigns each analysed element to one of the clusters, without the contemplating situations that do not fall within the predefined clusters.

The definition of the objective parameters that guide the clustering procedure moves from the work of Allwine and Whiteman (1994), who proposed a method for the investigation of air mass ventilation, recirculation, and stagnation. Using wind observations from a single station, they defined three characteristic quantities: (1) the “transport distance”, L, which is the net displacement of the wind trajectory, i.e., the actual distance between the start point and the endpoint of the path travelled by the air particle during a day, (2) the “wind run”, S, defined as the scalar sum of the transport distance at each time step (1 hour, in the present case), and (3) the “recirculation factor”, R, which is the ratio between L and S. Low (high) values of S describe situations of stagnation (ventilation). R tends to one when the wind has a persistent direction, while approaches zero in the case of substantial atmospheric recirculation (Fig. 2). In addition, Allwine and Whiteman (1994) defined the wind rotation angle, θ, measured from the North clockwise.

Fig. 2
figure 2

Schematic illustration of the wind run S (blue arrows), transport distance L (green arrow), and transport direction θ (red arrow). a, b Show the case of high and low recirculation, respectively

However, the above integral quantities are insufficient to fully describe the dynamics of coastal circulation and to discriminate between different weather regimes. Additional diagnostics are thus needed to better characterize the SBR and isolate it from the concomitant synoptic weather conditions (Li et al. 2020). We therefore adopted the following list of relevant objective parameters:

  1. (i)

    umorn, averaged zonal wind component in the period 0800–1200 UTC;

  2. (ii)

    vmorn, averaged meridional wind component in the period 0800–1200 UTC;

  3. (iii)

    uafter, averaged zonal wind component in the period 1500–1800 UTC;

  4. (iv)

    vafter, averaged meridional wind component in the period 1500–1800 UTC;

  5. (v)

    R, recirculation factor;

  6. (vi)

    cos(θ);

  7. (vii)

    sin(θ).

The first four parameters were selected on consideration of the typical hour range that comprises the onset and decline of the SBR in the study area (Di Bernardino et al. 2021c), regardless of the seasonal trend in insolation. They enable the clustering algorithm to capture events of persistent wind throughout the day and to separate them from the breeze-dominated days. Rather than the wind rotation angle, θ, cos(θ) and sin(θ) were used, so as to avoid discontinuities in the North quadrants and the consequent clustering errors. Before applying the k-means algorithm, all the parameters were normalized, so as to have zero mean and unit variance.

Accordingly, the time series from the seven meteorological stations, now spanning a total of 2482 days, were further processed to derive the regional time series of the objective parameters. Spatial averaging ensures that local effects due to the orographic differences across the selected sites are reduced and that the resulting values coherently represent the mesoscale patterns.

3 Results

3.1 Identification of clustering patterns

In this section, we describe the methods adopted for the evaluation of the most suitable number of clusters and the main characteristics of the identified patterns.

The greatest difficulty for the application of k-means clustering lies in the prescription of the optimal number of clusters, kO, for data classification. Here, kO is obtained by applying the “elbow” and the “silhouette” methods proposed by Kodinariya and Makwana (2013), and comparing the results. In both methods, the clustering algorithm is run several times with the same input data and varying k, so as to construct alternative k-dependent metrics. The “elbow” method derives inertia (or SSE) as a function of k, and identifies kO as the value after which inertia starts decreasing following a quasi-linear trend. On the other hand, the “silhouette” method measures the cohesion separation distance between the k clusters, by estimating how close each element of one cluster is to the elements of the neighbouring clusters. In this case, kO is the value that maximizes the average silhouette coefficient. The resulting alternative metrics are shown in Fig. 3. For the area of interest, kO = 3 appears to be a reasonable choice.

Fig. 3
figure 3

Results of the a “elbow” and b “silhouette” methods used for the identification of the correct number of clusters

Afterwards, the algorithm was run with the following setup: (1) random choice for the initial centroid positions, (2) a maximum of 300 iterations for the k-means algorithm, (3) maximum relative tolerance of the difference in the cluster centres of 2 consecutive iterations with respect to Frobenius norm of 10–4. The results shown here were achieved after 18 iterations and exhibit reasonable internal coherence, as indicated by a within-cluster SSE of about 10e4.

Figure 4 presents the daily trend of wind speed (represented by the length of the vectors and expressed in m/s) and direction (indicated by the orientation of the vectors and expressed in degrees) for the three clusters detected by the k-means algorithm. The plots show the diurnal variation of the wind vectors averaged over the population of each cluster.

Fig. 4
figure 4

Temporal variation of hourly wind speed (m/s, vector length) and direction (degrees, vector orientation) for the three clusters identified by the k-means algorithm

Figure 4a shows the wind evolution along the day for the cluster hereinafter indicated as “Northeasterly Cluster”. The prevailing wind direction is stably from the northeast throughout the day. Wind intensity ranges between 1.75 (1500 UTC) and 3.42 m/s (1000 UTC), reaching a maximum in the central hours of the day, with a daily-average of 2.37 m/s. The local circulation appears to be dominated by synoptic forcing, and the SBR fails to develop. The second typical pattern detected, shown in Fig. 4b, is hereinafter called “Breeze Cluster”. During the night, the wind blows from the Northeast quadrant, i.e., from inland, with very low intensity (a minimum of 1.03 m/s is reached at 0500 UTC). At 0800 UTC, the stations record a sharp change in wind direction, associated with generally calm wind conditions. From 0900 UTC onwards, the wind progressively veers, until it blows perpendicular to the coastline. Such rotation corresponds to a rapid increase in wind speed, as expected at the onset of the sea breeze, which reaches a maximum of 3.72 m/s at 1400 UTC. On average, the breeze is present until 1800 UTC, when the wind rotates clockwise and the speed decreases back to the night values typical of the land breeze. It is worth pointing out that the graphed trend was obtained by pooling all data together, without accounting for seasonal variability, and therefore smoothing out any seasonal difference in the onset and offset time of the breeze. This second cluster is representative of pure-breeze events, in which the mesoscale and the local circulation prevail over the synoptic-scale dynamics. Figure 4c depicts the third pattern detected, hereinafter named “Southeasterly Cluster”. Here, the synoptic and mesoscale circulations interact, giving rise to a more complex anemological pattern. During the night, the wind blows from the East, with speeds below 1.5 m/s. From 0600 UTC onwards, the wind rotates clockwise, settling in the Southeast quadrant. The wind velocity gradually increases until 1300 UTC, and then it slowly decreases. The wind direction remains fairly constant until 1700 UTC. In the evening, the wind veers counterclockwise and a decrease in intensity is observed. The anemological trend of this cluster is attributable to the varying relative weights of the synoptic-scale and sea-breeze regimes in determining the overall wind pattern, with the low-intensity synoptic-scale wind from the East prevailing at night, and the SBR causing the acceleration and the diurnal rotation of the flow.

The results relative to the breeze cluster are in agreement with previous studies conducted in the same area. Colacino (1982) and Petenko et al. (2011) showed that this region is subject to two prevailing wind regimes: (1) the drainage effect from the bottom of the Tiber valley, which generates a flow from the North and is mainly observed in the eastern side of the city, where the proximity of the reliefs influences the circulation, and (2) the SBR, which affects the region during daytime throughout the year, but more frequently in summer. Furthermore, these results are in accordance with the findings by Di Bernardino et al. (2021c), who showed that the wind blows from the Southwest when the urban center of Rome is reached by the sea breeze, and remains almost constant in direction for several hours.

Table 2 summarizes the relative occurrence of each cluster over the period of interest, together with the time means and standard deviations of the associated objective parameters. The three distinct patterns occur with comparable frequencies, the Breeze Cluster being present 37% of the time, followed by the Southeasterly Cluster (32.6%) and by the Northeasterly Cluster (30.4%). As expected, the average values of L, S, and R are comparable across the latter two clusters, with R equal to 0.79 and 0.65 for the Northeasterly and the Southeasterly clusters, both indicating high ventilation and low recirculation conditions in consequence of the persistent wind direction (Fig. 2, panel b). This result confirms the internal coherence and robustness of the classification obtained via the k-means algorithm. On the other hand, the Breeze Cluster is characterized by a markedly lower R (0.49), consistently with the recirculation conditions induced by wind rotation (Fig. 2, panel a), which also account for the significantly lower L (30.2 km). A decrease in S (58.5 km) is also observed. These values are in agreement with the results in Li et al. (2020): although the regions of interest have different morphological and climatological characteristics, the wind-integral quantities have similar values for comparable patterns.

Table 2 Occurrence and mean (standard deviation) of input parameters for the three clusters

To appreciate the relative influence of synoptic and local atmospheric circulation on the different clusters we identified, Fig. 5 shows the corresponding average synoptic maps, as derived from the ECMWF atmospheric reanalysis ERA5 data (https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5), by only considering the days belonging to each single cluster. Geopotential data, used to compute geopotential heights, and wind components at the 850 hPa level with a spatial resolution of ~ 30 km (Albergel et al. 2018) are considered. Maps refer to 1200 UTC, chosen as representative of the whole day.

Fig. 5
figure 5

Average synoptic conditions for a northeasterly cluster, b breeze cluster, and c southeasterly cluster at 1200 UTC from ERA5 reanalysis. Contours are isobars (in geopotential metres), while vectors depict wind intensity and direction at the geopotential height at 850 hPa

For the Northeasterly Cluster, Fig. 5a shows a high-pressure system located over the North Sea and a low-pressure system over the Balkans, as well as a trough over the western Mediterranean. In this baric configuration, Italy is at the convergence zone between a western anticyclonic system and an eastern cyclonic system. These conditions typically give rise to northeasterly winds. Figure 5b depicts the distinctive synoptic circulation occurring in the Breeze Cluster. The map depicts the typical summer conditions of the Italian peninsula: no evident pressure gradients are observed over Italy, and, consequently, there are no relevant synoptic winds, ensuring intense heat and good weather. These conditions, coupled with the intense atmospheric heating, responsible for the difference in temperature (and, therefore, pressure) between ocean waters and the dry land, favour the predominance of mesoscale or local-scale phenomena, such as the SBR. Finally, Fig. 5c presents the characteristic conditions occurring in the case of the Southeasterly Cluster. A high pressure region is located over northern Europe and a high baric gradient is observable between northern and southern Italy, giving rise to winds from the southern quadrant. Again, we underline that the map shows the average conditions at 1200 UTC, i.e., when the SBR interacts with the synoptic circulation, as shown by the daily wind trend in Fig. 4c. Southern winds determine persistent high-pressure conditions and the advection of air masses from North Africa, also responsible for the transport of Saharan dust up to the middle latitudes (Gobbi et al. 2019).

The aforementioned results suggest that the clustering algorithm properly worked, allowing for the identification of anemological low-level patterns affected by both the synoptic and the mesoscale circulation, by only requiring the measurements of wind speed and direction from coastal stations. In what follows, the cluster analysis is deepened by considering the monthly variability of the wind integral quantities and the seasonal and monthly distribution of pattern occurrences.

3.2 Seasonal and monthly analysis

For each cluster, Fig. 6a shows the distribution of the number of occurrences across seasons. As already remarked, during the analysed period, the three clusters have comparable frequencies, with a slight predominance of the Breeze Cluster (37%) with respect to the Southeasterly and Northeasterly Clusters (32.6% and 30.4%, respectively). From the figure, it is evident that the Northeasterly Cluster mainly occurs during winter (43.8% of its total occurrences) and that the autumn and spring samples still representing a significant fraction of the total (29.9% and 18.4%, respectively). The relative number of occurrences sharply decreases in summer (7.9%). On the other hand, the Southeasterly Cluster has a more homogeneous seasonal distribution, the relative number of occurrences exhibiting limited variations across seasons (spring: 29.5%, autumn: 25.1%, winter: 23.9%, summer: 21.5%). Finally, the overall slightly prevalent Breeze Cluster is mostly observed in summer (44% of its total occurrences) and spring (29.1%), with a less populated autumn sample (18.3% of the total) and minimum manifestation in winter (6.6%). The results are in agreement with those presented by Mastrantonio et al. (2006), who studied the anemological conditions of the same area, showing the persistence of sea-breeze circulation for a large part of the year, with a pronounced diurnal cycle. Moreover, in the same region Petenko et al. (2011) detected the SBR throughout the year, with higher frequency in summer and spring, by exploiting both ground-based observations and numerical simulations.

Fig. 6
figure 6

Histogram of the a seasonal and b monthly occurrence of different clusters

From the point of view of seasonal time frequencies, it is observed that the Breeze and the Southeasterly Clusters have similar frequencies in spring (41.3% and 37.1%, respectively), as compared to the more sporadic Northeasterly Cluster (21.6%). In summer, the SBR prevails (63.3% of the time), followed by the southeasterly pattern (27.3%), which is associated to the interaction of the atmospheric flow with the North African promontories, resulting in the typical summer high pressure and fair weather conditions (Fig. 5). In autumn, the three patterns have a homogeneous distribution, with a slight prevalence of the Northeasterly and Southeasterly Clusters (37.9% and 34.1%, respectively). During winter these two patters definitely prevail, with the former appearing 54.5% of the time and latter exhibiting a relative frequency of 32%.

The variability of the relative frequencies associated to the three cluster is more evident when considering their monthly variations along the year, as shown in Fig. 6b. The monthly frequency of the Northeasterly Cluster decreases along the transition from winter to spring, reaching a minimum in summer (6% in July), and increasing again in autumn and winter. In December, this cluster occurs about 75% of the time. The plot highlights that the monthly frequency of the Southeasterly Cluster is fairly homogeneous throughout the year, ranging from 20 to 40% of the total monthly cases, with minima in August (27%) and December (20%). The evolution in the monthly frequency of the Breeze Cluster, as expected, is in counter phase with respect to that of the Northeasterly Cluster, reaching its minimum in winter and its maximum in summer (9% in December and 68% in July).

Finally, to evaluate the temporal variability of the wind-integral quantities defined in Sect. 2.2, Fig. 7 shows the monthly evolution of the average wind run in km, S (panel a), and recirculation factor, \(R\) (panel b), obtained for each cluster.

Fig. 7
figure 7

Monthly trend of a wind run, S, and b recirculation factor, R, for different clusters

For the Northeasterly Cluster, \(S\) shows a strong monthly variability, reaching a maximum of about 112 km in spring (March), then progressively decreasing in summer, autumn, and winter, when it reaches a minimum of approximately 58 km (December). The monthly evolution of \(R\) is in line with that of \(S\), although with more limited variability (maximum in March, 0.90, minima in June and September, 0.72). During spring and summer, when the occurrence of the Northeasterly Cluster is anyway rarer, \(R\) assumes the largest values in the year, despite the comparatively higher values of \(S\) with respect to the other clusters. This implies that the spring and summer months are characterized by more intense ventilation, mainly due to persistent synoptic wind conditions, as highlighted in the previous section.

The Breeze Cluster exhibits \(S\) values that are comprised between about 50 km (September) and 80 km (January), with a peak of approximatively 112 km in December. \(S\) decreases in spring, reaches a minimum in summer and then rises again during autumn and winter. \(R\) is relatively lower in spring and autumn and relatively higher in summer, when the breeze is more frequent and intense, and air particle displacement is anyway enhanced. In general, \(R\) ranges between 0.40 and 0.55, with a somewhat unexpected peak of about 0.70 in December, that is still sustained in January. However, it is important to note that the relatively higher winter values of \(R\) are still lower than those found for the other two clusters, and may be associated with the passage of cold fronts, which align with the breeze front and are not distinguished by the algorithm, due to their having similar anemological characteristics. This aspect will be further investigated in future research.

For the Southeasterly Cluster, \(S\) reaches the highest values (about 95 km) in autumn and winter (February and December) and the lowest (about 60 km) in summer (July and August). The values of \(R\) are constantly confined between the curves derived for the two other clusters, and range between 0.58 (September) and 0.70 (November), with limited monthly variability.

The analysis of seasonal and monthly variability therefore allows to conclude that, in the examined region, the synoptic and the mesoscale meteorological conditions alternatively prevail and govern the local ground-level circulation, depending on the time of year. In winter, the persistence of synoptic wind from either the northeast or the southeast dominates, while in the summer months, i.e., when the temperature and pressure gradients across the coastline increase, the mesoscale sea breeze circulation takes the lead. In spring and autumn, the atmospheric conditions vary rapidly, and the occurrences of either cluster are equally distributed.

4 Conclusions

In this work, the observations of wind intensity and direction provided by seven ground-based meteorological stations, located in a coastal area of central Italy, are used to describe the anemological peculiarities of the region. The data, covering a 7-year interval, are analysed using the k-means clustering algorithm, which is widely used in environmental and meteorological studies.

Clustering is carried out by considering seven objective parameters, based on the wind-integral quantities proposed by Allwine and Whiteman (1994) and on the diagnostics suggested by Li et al. (2020). The former allow the evaluation of the daily ventilation/stagnation conditions, while the latter provide useful parameters to identify the onset and the cessation of the SBR. This method allows discriminating between days when the atmospheric low-level circulation is dominated by the synoptic conditions and days when the SBR prevails.

For the region of interest, the algorithm identified three characteristic clusters:

  • the Northeasterly Cluster, dominated by the synoptic conditions and characterized by prevailing winds from Northeast throughout the day;

  • the Breeze Cluster, when the mesoscale circulation takes over and the onset of the sea breeze can be recognized from the increase in wind velocity and the wind rotation towards the southwest quadrant, i.e., perpendicularly to the coastline;

  • the Southeasterly Cluster, dominated by the synoptic conditions and occurring when the wind blows from the southeast quadrant for the entire duration of the day.

From this information, general conclusions can be addressed. When the Northeasterly Cluster occurs, Italy appears to be located at the convergence of a high-pressure system located over the North Sea and a low-pressure system over the Balkans. Conversely, when the Southeasterly Cluster arises, the Italian peninsula is crossed by a large baric gradient, which generates winds from the southern quadrant. For both the Northeasterly and the Southeasterly Clusters, the synoptic circulation dominates over the local and the mesoscale dynamics, and the SBR fails to develop. Nonetheless, in the latter case the atmospheric flows at the different scales interact, generating a more complex anemological pattern, in which the SBR causes wind intensity to increase and wind direction to veer clockwise during the central hours of the day, after which the process is, to some extent, reversed in the afternoon. Moreover, the Southeasterly Cluster can occur in conjunction with Saharan dust outbreaks. Conversely, the Breeze Cluster is characterized by the prevalence of the local circulation over the synoptic one. The synoptic circulation mainly governs the coastal weather conditions from October to March, when the Northeasterly and the Southeasterly Clusters occur in more than half of the selected cases. On the contrary, the SBR prevails in summer, even if the sea breeze regime can be detected during the whole year.

It should be emphasized that all available data are forcibly classified according to the prescribed parameter kO, which is set a priori as the number of clusters which better accounts for most possible weather configurations, based on the analysis of the specific time series in use. This implies that ambiguous events (e.g., rotations of the synoptic-scale wind along the day for reasons other than the interaction with sea breeze) can be erroneously attributed to one of the clusters, when they in fact would call for their own category. Another possible confounding factor arises from the need of limiting the analysis to stations that lie within 30 km of the coastline, which only allows including one urban location. As a consequence, the effects of the Rome UHI on the low-level local circulation could not be evaluated in the present investigation. Future experimental research is planned (1) to assess whether accounting for the UHI requires increasing kO, and (2) to generalize the clustering algorithm by linking the onset and cessation of the sea breeze to the hours of light, which also enables to consider the seasonal trend in insolation. The results of this work can help investigate the relationship between meteorological clusters and air quality in coastal areas, and characterize the local wind energy resource. If sufficiently long datasets are available, the proposed methods can be applied to the analysis of the regional climatology of coastal areas.