The utility of fused airborne laser scanning and multispectral data for improved wind damage risk assessment over a managed forest landscape in Finland

The potential of airborne laser scanning (ALS) and multispectral remote sensing data to aid in generating improved wind damage risk maps over large forested areas is demonstrated. This article outlines a framework to generate such maps, primarily utilizing the horizontal structural information contained in the ALS data. Validation was done over an area in Eastern Finland that had experienced sporadic wind damage. Wind is the most prominent disturbance element for Finnish forests. Hence, tools are needed to generate wind damage risk maps for large forested areas, and their possible changes under planned silvicultural operations. (1) How effective are ALS-based forest variables (e.g. distance to upwind forest stand edge, gap size) for identifying high wind damage risk areas? (2) Can robust estimates of predicted critical wind speeds for uprooting of trees be derived from these variables? (3) Can these critical wind speed estimates be improved using wind multipliers, which factor in topography and terrain roughness effects? We first outline a framework to generate several wind damage risk–related parameters from remote sensing data (ALS + multispectral). Then, we assess if such parameters have predictive power. That is, whether they help differentiate between damaged and background points. This verification exercise used 42 wind damaged points spread over a large area. Parameters derived from remote sensing data are shown to have predictive power. Risk models based on critical wind speeds are not that robust, but show potential for improvement. Overall, this work described a framework to get several wind risk–related parameters from remote sensing data. These parameters are shown to have potential in generating wind damage risk maps over large forested areas.

been damaged during autumn and summer storms since 2000(Kärhä et al. 2018). This is a substantial amount considering that, for example, the average annual roundwood removal in Finland in the past few years has been around 65 to 70 million m 3 (Vaahtera et al. 2018). The increased amount of wind damage may at least partially be explained by increasing volume of growing stock and changes in forest structure (e.g. age, size, tree species composition) due to forest management interventions. Increasing forest disturbances under a changing climate may even cancel out the expected higher forest productivity (Reyer et al. 2017).
In boreal conditions, wind damage is most likely to occur especially at stand edges adjacent to newly clearcut areas or in recently heavily thinned older stands (Laiho 1987;Zubizarreta-Gerendiain et al. 2012). This is because of the sudden increase in wind loading in such stands (Gardiner et al. 1997). In general, older Norway spruces (Picea abies [L.] Karst.) with shallow rooting are the most vulnerable to wind damage, followed by Scots pines (Pinus sylvestris [L.]). However, also broadleaves like silver and downy birches (Betula pendula and Betula pubescens) are vulnerable to wind damage in summer (with leaves) unlike in late autumn or winter (without leaves) (Zubizarreta-Gerendiain et al. 2012;. The susceptibility of trees and tree stands to wind damage is largely affected also by tree height, diameter/height ratio, crown and rooting characteristics, stand density, and site characteristics Dupont et al. 2015;Ikonen et al. 2017). Mitchell (2013) provides a comprehensive overview of the contributing factors as well as the impacts of such disturbances for forested areas. At the regional level, the fragmentation of the landscape (Zeng et al. 2007;Heinonen et al. 2009) and tree species composition (Ikonen et al. 2017) affect the risk of wind damage. For example, large height differences between neighbouring stands increase the risk for wind damage as well as a large proportion of Norway spruce dominated stands (Heinonen et al. 2009;Valinger and Fridman 2011;Ikonen et al. 2017;Zubizarreta-Gerendiain et al. 2012). The risk of wind damage to forests may also increase in boreal conditions under climate change even if the frequency and severity of wind storms do not increase Feser et al. 2015). This is due to the shortening of the frozen soil period, which still currently improves tree anchorage during the the windiest season of the year from late autumn to early spring, e.g. in Finland, from October to March ). On the other hand, topography and surface roughness also affect the local wind climate (e.g. mean wind speed, gustiness, and its duration), and thus, risk of wind damage to forests ).
Mechanistic wind damage risk models offer the means to predict the threshold wind speeds (critical wind speeds) needed to uproot or break trees in alternative management schedules, based on the characteristics of the subject stand and its immediate neighbour stands (e.g., , Peltola et al. (2010), Gardiner et al. (2008), and Dupont et al. (2015)). They allow also the calculation of the probability of wind damage based on local wind characteristics (e.g. Byrne and Mitchell 2012;Seidl et al. 2014;Ikonen et al. 2017). Based on HWIND model simulations , Heinonen et al. (2009) further on developed simple regression models, which may be used to predict the threshold wind speeds needed to uproot Scots pine, Norway spruce, and birch at upwind stand edges in Finnish conditions. These predictions are based on the characteristics of both the subject stand (tree species, mean tree height, and breast height diameter to height ratio) and the adjacent stand (mean height and area of the stand), respectively. Others have also employed statistical modelling (Schmidt et al. 2010;Albrecht et al. 2012;Kamimura et al. 2015;Suvanto et al. 2019) or machine learning techniques (Hanewinkel 2005;Hart et al. 2019) to investigate and understand the risk of wind damage.
Over the past two decades, airborne laser scanning (ALS) has established itself as a versatile tool for forest mensurationists and managers. For example, it has already become part of large-scale forest inventory pipelines of some countries, especially in the Scandinavian region (Maltamo and Packalen 2014). It also shows potential for being adapted for other diverse uses such as largearea productivity assessment (Tompalski et al. 2015;Gopalakrishnan et al. 2019), the characterization of streams and riparian areas (Tompalski et al. 2017), habitat mapping (Vierling et al. 2008), and general biodiversity assessment (Müller and Vierling 2014). ALS data has become an integral part of Finnish forest management scene for the past 10 years; it is used broadly (in combination with aerial images) to formulate management plans of almost all managed forests (including family-owned, companyowned, and state) (Holopainen et al. 2014). This data is collected on a periodical cycle jointly by the National Land Survey (NLS) of Finland and the Finnish Forest Centre. The current annual cost of this data collection and processing is approximately 1 to 2 million euros. Hence, given that a substantial amount of resources is spent annually in collecting this spatially explicit and large-area dataset, it is imperative that additional value-added uses of the data (such as windstorm risk estimation) are systematically studied. The ALS-based vertical structure of forests has been previously used to assess windstorm risk (Saarinen et al. 2016). In a previous study, ALS data was used to generate tree lists, which was subsequently used as input to the ForestGALES forest wind damage model (Suarez et al. 2008). But, to date, little effort has been dedicated to explore the utility of ALS-based stand-level information (and associated horizontal metrics) when coupled with a forest wind risk model in assessing wind damage risk. Meanwhile, the efficacy of such ALS-based horizontal metrics in the context of forest analysis and management has been highlighted before (Gopalakrishnan et al. 2018).
The objective of the present study is to determine the utility of remote sensing-based forest stand information to better estimate wind damage risks. Specifically, we try to address the following research questions: 1. How useful are forest stand variables derived from ALS and multispectral data (such as distance to upwind edge and gap size) for identifying high wind damage risk areas? 2. Can good estimates of critical wind speeds for uprooting trees be derived from these variables? 3. Can the estimates of these critical wind speeds be improved, when used in conjunction with wind multipliers, thus including the effects of terrain roughness and topography?
This set of questions will help us appraise whether ALS and multispectral data collected over large areas (mainly for land inventory purposes) can also be used to understand patterns of wind damage risk over forests.

Study area
Our study area is a large, forested area southwest of the city of Joensuu, eastern Finland (Fig. 1). This area represents a typical managed middle boreal landscape, with a dominance of coniferous species. The main tree species present are Scots pine, Norway spruce, silver birch, and downy birch. A few other deciduous tree species such as aspen (Populus tremula) and grey alder (Alnus incana) also occur in the midstory canopy layers. The proportion of Norway spruce, Scots pine, and broadleaves as measured by their growing stock volume are 40.5%, 51.4%, and 8.0%, respectively.

Remote sensing datasets used
Lidar data Airborne laser scanning (ALS) data was collected over the study area using the Leica ALS60 laser scanner system between 30 th April and 3 rd May, 2016. This instrument is able to record up to four echoes for each emitted pulse (including intensity measurements). The associated mean flying height was 2400 m above ground level, the scanning angle was ±20 • , and a side overlap between flight strips was approximately 20%. This flying configuration resulted in a nominal sampling density of about 0.9 emitted pulses per m 2 . A digital terrain model (DTM) was constructed by first classifying points as ground and non-ground returns using the approach described by Axelsson (2000). A raster DTM of 2 m spatial resolution was Fig. 1 (a) The location of the study area in Eastern Finland is marked by a red star in the inset (small) map. It is located in the municipality of Liperi, and is around 43,000 ha. The main map shows the outline of the study area, in red. The image enclosed is an aerial photomosaic of 2016, indicating the major land cover types present. (b) An ALS-derived canopy height map over the study area, indicating vegetation and land cover heights5 then obtained by computing the mean of the ground echoes within each raster cell. Values for the cells with no ground echoes were interpolated using Delaunay triangulation and triangular interpolation.
Aerial photographs Aerial images were acquired over the study area by the National Land Survey of Finland on 23 rd and 24 th of May 2016 using a DMC Z/I Intergraph (01-0128) digital aerial camera. The focal length of the camera was 30 mm and it had four spectral bands: red, green, blue, and near-infrared. The associated flying height was 4100 m above the ground level. The camera had 3456 × 1920 pixels for the multispectral bands, resulting in a ground sampling distance of approximately 160 cm. External sensor orientations were determined by the bundle block adjustment technique, using both ground control points and tie points. Two remote sensing products were used in this study: (1) microstand forest data and (2) gridded forest data. The microstand data was provided by Blom Kartta Ltd. The gridded data is publicly available as part of a data distribution service maintained by the Finnish Forest Centre (Finnish Forest Center (2019)).
Microstands are smaller and more homogeneous than conventional stands used in silviculture. They are created by means of image segmentation. Here, the segmentation was based on a fused version of the ALS and aerial image data. Initial segmentation was done using the Trimble eCognition program. Then, Chaikin's algorithm as implemented in GRASS GIS (GRASS Development Team (2017)) was used to slightly smooth the segment borders. These smoothed segments were then divided into smaller parts, which we call nanosegments. Nanosegments are comparable with cells in conventional ALS inventories. The benefit of using nanosegments is that they do not cross the microstand borders; i.e. problems with 'edge cells' are avoided. The process of creating microstands and nanosegments is described more detailed in Pascual et al. (2019). Forest stand attributes were then predicted to nanosegments and then aggregated to the microstands (see Maltamo and Packalen (2014)). Thus, the final output of this segmentation-based process is the partitioning of the forest landscape into microstands, each being associated with a set of forest attributes. In this work, we used the dominant canopy height and stand density attributes from each microstand. Here, dominant height is defined as the mean height of the 100 tallest trees per hectare. See Fig. 2 for an illustration of microstands, and their dominant heights.
The gridded forest data consists of high-resolution (cell size 16×16 m) forest attribute data from the Finnish Forest Centre (Finnish Forest Center (2019)). This again involves a fusion of ALS data and aerial images. Forest attributes were predicted for each such cell, similar to predictions made for each nanosegment in the case of microstands. We used three attributes from this dataset for our study, namely the main tree species, median height, and median diameter.

Field-observed wind damage data
Various locations in the study area were visited by an inventory crew during June to September, 2016. This was part of an independent inventorying effort. While travelling through the forest, they recorded several locations where recent wind damage to forest stands was observed (i.e. damaged within the last 2-3 years). This time-interval was assessed by the field crew by examining leaf or needle attachment and stem tarnishing patterns. The field crew identified wind damaged areas by noting that at least one uprooted tree was found on the ground. In some cases, as many as 20 uprooted trees were found on the ground at the damaged spot. Hence, snow damage (where trees are bent or broken in a characteristic manner, and typically involve fewer trees) can be ruled out on all sites. The location of each such wind damaged spot was recorded using a standard, consumer-grade, GNSS-enabled smartphone (estimated location accuracy is ±5 m). In some cases, highresolution satellite images coupled with visual inspection of the site were used to correct the geolocation of the observed wind damage areas. As the field crew were not sampling systematically the whole study area, such damage locations constitute only 'presence-only' data in contrast to presence-absence data (to be discussed later). From this dataset, we first discarded locations where only two trees or less had been uprooted or broken. Then, we applied the following additional criteria: (1) The locations should be in forested stands, with a (microstand) dominant height of at least 16 m, and (2) they should be at least be 100 m away from each other (as to minimize autocorrelation effects). In Finland, young forest stands (height <16 m) have very low possibility of wind damage (Zubizarreta-Gerendiain et al. 2012), hence the height cutoff. After this screening step, we were left with 42 locations of observed wind damage, spread over the study area.

Wind climate and wind multiplier data
Our field observations for wind damage data were from 2016, and therefore we ascertained the windstorm directions that could have been associated with such damaging events. From the observed wind climate data of 2014-2016 (Valta et al. 2019), we identified that windstorm magnitudes were strongest along three directions: west, northwest, and north. Hence, there is a strong possibility that the recorded wind damage had been caused by winds primarily along these directions.
We also downscaled regional meteorological wind speeds (i.e. either from the local weather station or from reanalysis) to local wind speeds, considering the effects of topography and surface roughness (land cover) characteristics. This was done using the wind multiplier approach . In this approach, the upwind characteristics of the local topography and surface roughness in a 20-m grid were taken into account via topographical wind speed multiplier (tm) and roughness wind speed multiplier (rm), respectively. In tm, the small scale variations of topography in a 1000-m upwind fetch from the point on interest, e.g. the wind speed accelerating effect of upward slope, as well as the general increase of wind speed as a function of elevation were considered. In rm, surface roughness of a 500-m upwind fetch was used to calculate effective roughness with larger weight closer the point of interest. The development of the wind multipliers used in this study has been described earlier in more detail ).

Calculation of critical wind speeds for uprooting of trees using remote sensing data
The critical wind speed (cws) required for causing damage to a tree can be defined as the threshold wind speed needed to uproot or break the tree. It inherently depends on the properties of individual trees and forest stands in consideration. In this study, we calculated the cws needed for uprooting of trees by using a regression model (Heinonen et al. 2009), which was developed based on calculations using a mechanistic wind damage model (HWIND, see ) under various stand and forest configurations. We use the simplified regression version of HWIND so as to make our task computationally tractable and hence scalable over large areas.
The wind direction is an important driver of critical wind speed estimates at any given forested point. This is because the cws at the point depends on the shelter provided by neighbouring stands and the size of upwind gap, both of which may differ largely depending on wind direction (Gardiner et al. 1997) . In this work, we considered winds blowing in from eight different directions, namely north (N), northeast (NE), east (E), southeast (SE), south (S), southwest (SW), west (W), and northwest (NW). For each forested point considered, we could hence calculate eight different cws values from the north to the northwest: cws N (critical wind speed at the point, considering that the wind is blowing from the north), cws NE , cws E , cws SE , cws S , cws SW , cws W , cws NW .
For the calculation of cws at any forested point, we first fixed the wind direction under consideration as one of the eight mentioned above (N, NE, E,...). Then, we executed several steps, a brief outline of which is given in Table 1. These steps are then replicated for all eight wind directions. We restricted ourselves to these eight for simplicity; the wind direction could be any arbitrary vector between 0 and 360 • . The implementation of the steps in Table 1 was done using several scripts written in the python programming language (version 2.7.10) and the R programming language (version 3.5.1), and with the help of ArcPy GIS modules (version 10.4.1).

Calculation of threshold meteorological wind speeds (tmws)
For our study area, wind multiplier data is available for each of the eight directions (see Section 2.4). Consider a steady wind from the north. Consider that the meteorological wind speed (i.e. measured at the local weather measuring station using standard techniques) is mws N . In this case, we have two wind speed multipliers available along this wind direction: (1) tm N : Topographical wind speed multiplier (north direction), (2) rm N : Roughness wind speed multiplier (north direction). Hence, the 'wind multiplier modified' wind speed at our point of interest can be calculated by multiplying these three values. And tree uprooting is highly probable if this calculated wind speed exceeds the critical wind speed cws N needed to cause Table 1 Outline of the five steps taken for calculation of the critical wind speed (cws) any given forested point Num.
Step description 1. The distance to the edge of the forest stand for the given point inside the given stand is estimated from the ALS-based forest microstand data. This is along the upwind direction, and is denoted as distT oEdge N , distT oEdge NE , ... depending on the wind direction considered. This estimation is done by constructing a transect along the upwind direction and scanning and analysing the profile of canopy height along it. Canopy height is uniform within a microstand, and changes (rise/drop) occur along the transect only during transitions from one microstand to another. A sudden and steep drop in canopy height, followed by a sizable 'gap' (i.e. relatively low vegetation height), signalled the location of the edge of the forest stand. We used a height percentage threshold to identify such steep drops: if the canopy height drops by more than 25% when transitioning between microstands, it was deemed to be a 'steep drop'. The stand density and dominant height (domH tStand) at the given point is also extracted from the microstand attribute data. 2.
The gap size (again, along the upwind direction; either of gapize N , gapize NE , ...) is also estimated from the microstand data. This was done by extending the transect (described above) beyond the forest stand edge and analysing the profile of canopy heights. The gap was considered 'closed' when there was a sudden, steep rise in canopy height when transitioning from one microstand to another. Here too, we used a height percentage threshold to identify such 'steep rises'. That is, if the canopy height reached at least 80% of the original height (i.e. before the drop of step 1), the gap was considered closed. This gap could be several land covers such as short seedling stands, agricultural fields, fallow plots, powerline clearings, roads, or water bodies. See Fig. 3 for an illustration of steps 1 and 2. 3.
The height of the surface/vegetation in the upwind gap/lower stand (i.e. sheltering stand) identified above is computed. This is the average height of the vegetation in the gap.

4.
Several forest variables are derived from the fine-resolution (16m) gridded data from the Finnish forestry centre. There, a combination of ALS data and aerial image data was used to estimate several species-level forest parameters (Maltamo and Packalen 2014). From this dataset, we derived: 1. Tree species: This is one of the following three: Scots pine, Norway spruce, or birch species; 2. Taper: This was defined as dbh/height, based on basal area weighted median diameter (DGM) and height (HGM); and 3. The basal area weighted median height (HGM) (i.e. tree height). 5.
If the point under consideration is very near the edge (i.e. less than 2.0 tree heights distance to the edge), the critical wind speed for uprooting is calculated using the regression model described in Heinonen et al. (2009). The variables used are tree species, taper (dbh/height), tree height, distance to the upwind edge, and size of the adjacent gap (estimations of which are described above). If the point is far enough inside the stand (more than 2.0 tree heights distance to the edge), the calculated cws is scaled as a function of the distance to the upwind edge and the stand density (based on , table 3). Based on this approach, the cws needed to uproot a tree increases when the distance to the upwind edge increases. The stand density affects cws less than the distance to the upwind edge, i.e. cws is only slightly increasing when stand density is increasing and vice versa.
A suitable wind direction has been assumed/fixed. Microstands are assumed to be equivalent of forest stands damage (mws N * tm N * rm N ≥ cws N ). We can re-formulate this inequality to define: where tmws N is defined as the threshold meteorological wind speed, for the north direction. This threshold, which is defined at any given forested point, is the wind speed needed at the local meteorological station along the given direction above which the critical wind speed at the damage point in question, along the given direction, is exceeded. Thus, we can define this threshold by dividing corresponding cws value with the associated wind multipliers. We further explain the utility of the concept of threshold meteorological wind speed using Fig. 4. In that figure, we assume that the cws values computed at points 1 and 2 are 18 m/s and 15 m/s, respectively (as indicated). Further, we assume that the corresponding topographic multipliers are 1.8 and 1.2, respectively. Hence, tmws needed for the southwest direction would be 10 and 12.5 m/s. Here, a tmws value of 10 m/s at point 1 means that a southwest wind of speed 10 m/s or higher as measured at point 3 (the meteorological station) would cause the critical wind speed threshold being breached at point 1. Point 1 has a higher critical wind speed in the southwest direction than 2. Hence, it would be considered less vulnerable, in the absence of wind multipliers. But we calculated that point 1 has a lower threshold meteorological wind speed (tmws), which makes it more vulnerable than 2; the local ridge topography increases the vulnerability of point 1.
All variables calculated can be seen in Table 2.

Validation of the approach
Outlines for validation We used the set of 42 damaged points (see Section 2.3) to validate the efficacy of variables (calculated above) for wind damage risk assessment. As this was a relatively low number of points, we used all of them for model formulation and validation, rather than keeping aside a proportion for independent testing of our  models. Hence, the strength of the formulated model, as evaluated using fit statistics, was the sole criteria to evaluate the goodness of fit of our models. As mentioned before, an analysis of the wind speed and direction data for the years of 2012 to 2016 suggested that the damaging storms for that period for our study area had originated from either west, northwest, or north. Hence, variables associated with these wind directions were studied in more detail in the analysis below.
We also evaluated the suitability of several sets of variables (see Table 2) to distinguish between these 42 damaged points, versus random (assumed non-damaged) forested points. We used two different and complementary modelling frameworks to evaluate the suitability of variables. The first was the maximum entropy modelling framework (usually abbreviated to maxent), a non-parametric, machine learning method (Phillips et al. 2006). The primary motivation for selecting maxent was its correct handling of for presence-only data, such as ours. For good discussion of the use of presence-only data and associated issues, see Gomes et al. (2018). Maxent models are built to discern and differentiate between the predictor variables at wind damage ('occurrence/presence') areas and those for the whole study area (i.e. the 'background'). It has been used extensively to study the factors behind observed biotic species distribution patterns (Elith et al. 2011), as well as for several observed abiotic phenomena such as wildfires (West et al. 2016;Parisien et al. 2012), landslides (Chen et al. 2017;Convertino et al. 2013), and even windstorms (Wade et al. 2015). Hence, it is well-suited to model wind damage occurrence over our study area. The second framework was the simple and straightforward approach of logistic regression Fig. 4 (a) Illustration of the modification of the critical wind speed (cws) by wind multipliers (1) (southwest wind). The map colours represent the topology over the study area: e.g. red spots denote small hills. We assume that the roughness multiplier rm SW is uniformly equal to 1.0 over this area (tmws SW = cws SW /tm SW ). Labels 1 and 2 represent two forested points, label 3 the local meteorological station. (b) Spatial patterns of the topographic wind multiplier for the southwest direction. Note that the patterns seen are inline with the topographic contours seen in part (a) (LR) (Hastie et al. 2013). LR has been previously used to model snow and wind damage (Canham et al. 2001;Saarinen et al. 2016;Valinger and Fridman 2011) and we use it as a second (simpler) alternative in this study. Maxent requires a large number of 'background' points, over where there is high probability that no wind damage has taken place. Essentially, in this framework, the predictor variable distribution at the occurrence points is contrasted with that at the backgro und points, and a suitable model that mimics these two distributions is formulated. For generating such background points, we randomly distributed several points over the study area constrained by two criteria. These criteria are similar to the ones outlined in Section 2.3 and are: (1) They are in forested stands, with a (microstand) dominant height of at least 16.0 m. (2) They are at least 100 m away from each other, and from any of the 42 locations of observed wind damage (again to minimize autocorrelation effects). This resulted in 4441 background points. The wind damage in our study area was limited to several small patches and there was absence of blowdown over large areas. Hence, there is an extremely high probability that such a randomly chosen forested spot would not be wind damaged.

Statistical analysis of computed values
We first analysed whether forest parameters related to wind damage were significantly different between damaged and background (assumed as 'non-damaged') points. This analysis was done separately for each of the eight possible wind directions, keeping in mind that the wind direction is a major driver of the associated spatial risk pattern. We used the Kruskal-Wallis ranked-sum test to compare the points (Hollander and Wolfe 1973). This test is whether there is a significant difference between the population medians, for the two given groups.

Suitability of critical wind speed estimates in predicting wind damage
We first used the following three predictor variables to model occurrence odds in the maxent model: cws W , cws NW , cws N . We ensured that the cross-correlation (Pearson's coefficient) between any of these variables was lesser than 0.6, before model fitting. Also, all predictor variables were scaled and normalized before being used in the model. That is, they were scaled so that their mean was 0.0 and standard deviation was 1.0. This was done to make the inter-comparison between them (i.e. their effect sizes) more intuitive. To evaluate maxent model strength, we primarily used two metrics: the gain and the area under receiving operating characteristics curve (AUC). The gain is an indication of how closely the model is concentrated around the presence samples. For example, a gain of 2.0 signifies that the average likelihood calculated by the model over the presence samples is 7.38 times (e 2.0 ) higher than at the background locations (Phillips 2005). The AUC is a measure of a model's ability to separate presence points from the background (Hastie et al. 2013). An AUC value of 0.5 or lesser suggests that the model-based predictions are not better or worse than random ones. Meanwhile, values between 0.5 and 0.7 indicate poor performance, while values increasing from 0.7 to 1.0 suggest progressively higher performance (Anderson et al. 2003). We also decided to use the default value of 1.0 for the regularization multiplier parameter in maxent, to avoid overfitting. It should be noted that the output (response) of the maxent model is an index of suitability (as a function of the predictor variables), and should not be interpreted as probability of occurrence. We also plotted and compared response curves for each predictor in the model. These curves indicate the dependence of the response variable to the selected predicted variable, keeping all other predictors constant, at their average value. If the maxent model was considered to have good predictive power, we opted for a 'jackknife test' of variable importance. This essentially involves excluding predictor variables and assessing the drop in model fit statistics (Phillips 2005).
In the case of LR too, we used the entire set of 42 damaged points and 4441 background points mentioned above to formulate the model. It is possible to use LR in understanding the underlying relations (even with such unbalanced datasets) (King and Zeng 2001) although the results should be treated with some caution. All LR regression coefficients (except the intercept) are wellestimated from an unbalanced dataset such as ours (Hosmer and Lemeshow 2000;King and Zeng 2001); hence, the relative importance of predictors can be estimated. Again, the discriminating ability of the LR models was tested using the non-threshold-based metric of AUC (Hosmer and Lemeshow 2000). As in the case of maxent, all predictor variables were normalized to make inter-comparisons easy. LR coefficients are easy to interpret, and they represent the change of the odds of the event of interest (i.e. wind damage, in our case) for a unit change in the predictor variable. Again, the associated sign indicates whether the odds increase or decrease for increase in the variable considered.
Suitability of tmws speed estimates in predicting wind damage Here, the maxent and logistic regression modelling frameworks were fitted with tmws-based predictor variables. All other steps taken were the same as described above. We hypothesize that the fit will be slightly better in this case as local topography and roughness-based effects (i.e. speeding up or slowing down of winds) are thus factored in.

Suitability of forest stand variable estimates for predicting wind damage
Here, we used a more direct approach, using a carefully chosen set of forest stand variables to model wind damage risk (the 'direct variables approach'). We decided to use a parsimonious set (i.e. only 4) of independent variables, considering that the number of wind damaged points was limited ('rule-of-10', see (Hosmer and Lemeshow 2000)). Also, we selected variable for which robust estimates were possible from the remote sensing data. Also, we kept in mind that the wind climate data (Section 2.4) suggests the importance of three wind origin directions (i.e. W, NW, N). Hence, we tried out (separately) three sets of independent variables: (1) isSpruce (dummy variable) + domH tStand + distT oEdge W + gapize W , (2) isSpruce (dummy variable) + domH tStand + distT oEdge NW + gapize NW , (3) isSpruce (dummy These values indicate the significance of the difference: for example, the value of 0.2453 in the first column is associated with the distance to edge of damaged versus background points (the two groups), considering a northerly wind. Hence, this high p value implies that the difference in the distances to the south edge is not significant Here, isSpruce is an indicator (dummy) variable indicating whether the gridcell is dominated by Norway spruce, and domH tStand is the average dominant height in the microstand segment. The first set above assumes that the damaging winds originated from the west, and hence points nearer to the forest stand edge on the eastern side of a gap were more susceptible to damage.

General differences between the damaged and background points
We estimated wind damage risk-related parameters listed in Table 2 using the remote sensing data, for 4483 points (i.e. 42 wind damaged and 4441 background points). This was done for all eight different wind directions mentioned in Section 2. We then used the Kruskal-Wallis test to detect significant differences between the damaged and background points ( Table 3). Four of the seven variables were significantly different between the damaged and background cases in the northwest direction, which is the highest proportion among the eight directions. Similarly, there were significant differences noticed for critical wind speed along the southwest and west directions, too. These results imply that the damage observed was mostly caused by winds from the northwest direction, although some damage may be associated with other directions as well. The distance to the stand edge is significantly different (p < 0.1) in the northwest direction, suggesting that this factor has probably contributed to observed damage. The magnitude of the differences of some notable entries of Table 3 can be seen in Fig. 5.

Classification models
Our estimates of critical wind speeds were fairly suitable (in a modelling context) for differentiating between damaged and background points (Table 4). Models formulated using both maxent and logistic regression frameworks were better than null models (i.e. AUC values were larger than 0.5). Likewise, threshold meteorological wind speed estimates yielded fair, but weaker, models. The 'direct variable approach' of using forest variables as predictors yielded better overall goodness-of-fit numbers than in the previous case (Table 5). With maxent, AUC values varied between 0.645 and 0.696, which indicated a good fit with observed data. The gain here is an indication of how closely the model is concentrated around the presence samples. In our case, the model gain of 0.171 for the northwest direction in Table  5 means that the average likelihood calculated by the model over the presence samples is 1.19 (e 0.171 , see Section 2.7) times higher, or 19% more, than that at a background point location. The logistic regression models also produced AUC values that were were quite similar to maxent, which implies that the basic structure of these models was quite comparable.
The response curves associated with the maxent model for the northwest direction can be seen in Fig. 6. We focused solely on this direction, both for the sake of brevity and because it was associated with the most robust models. Variable importance assessed via the maxent model, listed in the descending order, is: dominant height is highest (44.8%), distance to edge (northwest) is next (24.1%), gapsize (northwest) is next (23.7%), and 'isSpruce' has very low contribution (7.5%). Similarly, the effect size for various variables in the logistic regression-based models (of Table 5) is reported in Table 6. It should be noted that the sign of all effect estimates is logical and in agreement The median values for distance to edge are: 64.0m (damaged points) and 103.7m (background points); for upwind gap size: 43.6m (damaged) and 39.0m (background); for vegetation height in upwind gap: 6.8m (both damaged and background). This illustrates that the damaged points were closer to stands edges, and were associated with slightly bigger upwind gaps. A number of points had large upwind gap sizes (i.e. more than 200m) because of the lakes and the large clearings in the study area with previous results. This is except for gap size, where an increase in gapsize seems to be implausibly associated with lowering risks. We further explore the northwest wind direction model, as it has the best goodness of fit characteristics, see Table 7.

The utility of ALS and aerial image data in wind damage risk assessment
The overall results of our analysis indicate that ALS-based tree and stand characteristics have potential to be used in assessing spatial patterns of wind damage risk to forests. Canopy height has already been deemed important by many previous studies (Miller 1986;Mitchell et al. 2001;Albrecht et al. 2012;Locatelli et al. 2016;Díaz-Yáñez et al. 2017) as has been the distance to the upwind forest stand edge (Suvanto et al. 2018;Zubizarreta-Gerendiain et al. 2012;. The compelling additional contribution of this study is that we have outlined a new method to estimate these meaningful metrics over large forested areas using ALS and aerial image data. A major advantage of our dataset was that forest structure (both vertical and horizontal) was well represented, and this was solely because of the usage of ALS data. Both ALS and aerial image datasets comparable with the ones used in this study are available over the whole of Finland (Holopainen et al. 2014); hence, the methodology outlined has the potential for expansion to much larger areas. Based on our study, significant differences were seen along the west, northwest, and north directions for the critical wind speeds and consequently wind speeds needed for the damage at certain forested point at the local meteorological station (Table 3). This is inline with the fact that recent storms over the study area had tracks along those directions (Valta et al. 2019). Also, the distance to upwind stand edge was a significant variable (p value is 0.0642). Contrary to expectations, wind multipliers had less predictive value in our analyses, probably at least partially due to the relatively flat topography of the study area.
We used also two different analytical approaches (i.e. maxent and logistic regression) towards quantifying the relative importance of several sets of predictors. However, the results from these contrasting frameworks were roughly the same. Overall, the models formulated had only fair goodness of fit statistics, partly because of the low number of validation points available to us. Models based on the estimated critical wind speed were slightly less predictive  as those directly based on forest stand variables. This result may be due to several reasons. As for the 'direct variables' models, it was found out that the dominant height was highly predictive of the susceptibility to wind damage. Both upwind distance to stand edge and gap size variables were only moderately predictive. One reason for this could be that we did not distinguish in this study between temporary and permanent gaps. Temporary gaps are those created by forest clearcuts, logging decks, or such; they create vulnerable trees at the edge. On the other hand, permanent gaps are those associated with agricultural fields and lakes that are less vulnerable (for example, see Zubizarreta-Gerendiain et al. (2012)). In future studies, the 'age' of the gap (for example, a newly formed clearcut versus an agricultural field that has been in existence for decades), could be taken account in the predictions. Also, from Fig. 6c, one can see that in the maxent model (northwest), gap size increases susceptibility to damage at small values, but again decreases it at larger values. This could be due to the fact that smaller gaps are associated with clearcuts and logging decks (temporary) while larger ones are associated with large agricultural clearings and lakes (permanent).   Here, the northwest wind direction model from Table 6 is taken, and further analysed/simplified Values that are significant at the p = 0.05 level are highlighted (boldface)

Relevance of wind climate and direction
We observed a pattern of prominence of the northwest direction when significant differences were estimated between various stand-related metrics (Section 3). Our findings broadly indicate that the wind direction assumed is highly important in determining the spatial pattern of (future) storm damage risk. Hence, accurate projections of future windstorm climate are essential. In general, for the present climate, most wind storms over the study area are either along the west, northwest, or north directions, although rare easterly wind storms also occur (Valta et al. 2019). In a recent study by Ruosteenoja et al. (2019), it was found out that climate change will not have any major impact on wind direction during wind storm events. During the late autumn, westerly winds may slightly increase, but this is the prevailing direction also in the recent past climate.

Limitations of estimates and avenues for improvement
Our critical wind speed estimates were rather inaccurate. This is because they were only fairly predictive in distinguishing between damaged and non-damaged points (Section 3). It has been shown by earlier work that inaccuracies in the target tree characteristics (i.e. height or diameter) or other parameters that influence the magnitude of the wind loading experienced at the tree level can greatly influence the estimated wind speeds needed for uprooting Zeng et al. 2006). In our study, for example, the individual tree-level taper was generally overestimated, mostly because of the particulars of taper estimation from the forest centre's gridded data (Section 2.5). For any individual tree in each 16-m gridcell, we had computed taper as: DGM/H GM, where DGM and H GM are diameter and heights at the gridcell level. Such an averaging at the gridcell level can lead to biased taper estimates.
To understand this effect further, we looked at forest plot data that had been collected over the same area in 2016. From the total set (592 plots), we first selected plots that had H GM values greater than 16.0 m, and then randomly selected 50 from that set. There was a total of 1450 trees distributed among these plots. For each tree, we computed its taper (diameter/height). We then computed the difference in this taper value and the taper estimated at the plot level (DGM/H GM) (i.e. taper plot-taper tree). This value represents the error in taper estimation, for that single tree. Then, histograms of these error values (considering all 1450 trees) were made (Fig. 7). A positive bias in plotlevel taper estimation can be seen. In many cases, taper is overestimated by more than 50%. Keeping in mind that plots here correspond to the 16-m forest centre gridcells that we used, one can understand that our taper values may not be representative of individual tree tapers. They may have led to cws values being overestimated in many cases. Predicting taper (by DBH) by remote sensing remains a challenging task. In most cases, allometric models are used to predict DBH from other variables such as tree height. But this ignores tree density and silvicultural history and hence tend to be inaccurate . Another possible reason for the error in taper is that ALS data was collected in 2016, after many of the damage-causing storms had happened.
Another important factor that we could not include in our work was the effect of soil type. According to one recent study, the HWIND model tends to overestimate the wind speeds required to uproot trees for soils associated with shallow rooting (for example, partly open bedrock or poor drainage soils) while underestimating those with permanent Fig. 7 a Histogram of the error in taper (the difference between the taper calculated at the tree level and the taper calculated at the plot level) for plots examined. At the tree level, taper is calculated as the ratio between the individual tree's diameter and height, while at the plot level, it is calculated as the ratio between DGM and H GM. Many error magnitudes are large, considering that taper typically varies between 0.7 and 1.5. b Histogram of the relative error of taper (%), where 'error' definition is the same as that in a. The median value is 2.9% edges, respectively (Zubizarreta-Gerendiain et al. 2012). The authors further state that this was a cause of high discrepancy between predicted and observed damage in their case (n = 189 trees). Detailed soil maps are available over much of Finland (Lilja and Nevalainen 2006), and could be incorporated into our framework at a later stage.
Upwind gap size was not an important predictor for the maxent modelling framework, nor was it significant in the logistic regression models. This could be due to the following two reasons: (1) Wind damage susceptibility is somewhat insensitive to gap size magnitude, provided that it is sufficiently large enough. For example, a decrease of only 19% in critical wind speed was estimated for a six-fold increase in gap size, from 1h (1 tree height) to 6h. (2) The gaps in our study area have been existing for quite some time (i.e. several years) in most of the cases. Hence, the downwind trees have got acclimatized to the stronger winds.
We were also constrained by the limited number of validation points available to us from the study area. In other words, more validation points, such as plots from the national forest inventory (Suvanto et al. 2019), would have made our results more conclusive. Collection of such wind damage data over large areas is both laborious and time-consuming. In this context, remote sensing may provide a method to obtain such information quickly and objectively. There have been some studies that used highresolution imagery (collected via airplanes or UAVs) to detect windthrow at the scale of single trees (Honkavaara et al. 2013;Duan et al. 2017). The sentinel satellite-based radar data has also been used to detect windthrow (Rüetschi et al. 2019). A promising dataset in the context of collecting more validation points is discussed in Forzieri et al. (2020). It consists of over 80,000 spatially explicit demarcations of forest areas affected by windstorm damage, spread all over Europe, and over a period of 18 years.

Scope of maxent-based models for wind damage assessment
Our primary motivation for the selection of the maxent modelling framework is its suitability for the 'presenceonly' field data. We recognized that with such field data, wind damage occurrence modelling is similar to several species prevalence modelling efforts, where one tries to establish a relationship between occurrence records of a biological species and the environmental characteristics at that location (Franklin 2010). Thus, it makes sense to leverage on almost three decades of research done (in ecological contexts) for accommodating such presenceonly data. There are several caveats associated with such modelling efforts (e.g. issues in sample selection in some cases, see Kramer-Schadt et al. (2013); difficulty in estimating prevalence; also see Elith et al. (2011)). But the relative ease of collecting such data (and hence, realizing larger datasets) motivates the use of models that are able to handle it well. Such field data can be collected either by forest inventory crews or citizen or community sciencerelated efforts (McKinley et al. 2017). Meanwhile, the utility of maxent for modelling the prevalence of abiotic phenomena has been established in the context of wildfires (West et al. 2016;Parisien et al. 2012) and landslides (Chen et al. 2017;Convertino et al. 2013). In our case, when compared with LR, maxent was able to indicate interesting non-linear response patterns with respect to dominant height and gapsize, see Section 4.1 and Fig. 6.

Significance and implications for forest management
It has been reported by previous studies that the management of forest stands tends to highly influence their vulnerability to storms (Albrecht et al. 2012). Interestingly, it has also been argued that the cause of the increase in wind damage seen over Europe recently can be traced back to management-related changes in forest structure (Seidl et al. 2011). Our results also highlight the effect of tree height on wind damage risk and further suggests that one effective way to reduce susceptibility is to make final felling before forest stands reach 'high risk' heights. We have also shown the importance of reducing the extent of exposed edges, along the windward direction. This becomes important when evaluating stand harvesting options. That is, our methods and wind risk models could be coupled with a forest planning and optimization model (such as Monsu (Pukkala 2011)) to evaluate the effects of various silvicultural management options to the wind damage risks of the forested landscape. The alternative spatial and temporal schedules of harvesting possibly could be evaluated to find the most optimal one, with respect to wind damage risk consideration and other relevant management objectives. For example, a framework for deriving the optimal spatial harvesting pattern given the probability of wind damage has been suggested by Meilby et al. (2001).

Conclusions
We developed a methodology by which ALS and aerial image data could be used to develop several useful predictor variables (based on forest structure) for wind damage risk. Based on our findings: (1) This data is well-suited for deriving several forest stand configuration-related wind damage risk parameters; (2) the dominant height and distance to upwind stand edge are important in predicting wind damage risk; (3) critical wind speed (calculated based on simple regression models) and wind multipliers show promise in this avenue, but further work is required to make estimates more robust; and (4) wind direction is an important parameter, and it is essential to understand the pattern of frequency distribution of future wind storm winds for projecting future risk. We also showed that models based on edge proximity and dominant height have good predictive power, thus opening avenues for creating spatially explicit wind damage risk maps over large areas. However, further research is needed over bigger areas to refine our methods and the datasets used. Such exploration would hence help in fully realizing the capability of remote sensing to identify windstorm-vulnerable forest edges and parts, and to better guide forest management planning.
Funding Open access funding provided by University of Eastern Finland (UEF) including Kuopio University Hospital. This work was funded by the Strategic Research Council of the Academy of Finland for the FORBIO project (decision number 314224), at the University of Eastern Finland, Finland. This work also got funding also from the Ministry of Agriculture and Forestry as a part of a consortium project entitled 'New weather and climate tools for forest-based bioeconomy' (Säätyö; grant agreement number 100 625/03.02.06.00/2018).

Data availability
The datasets generated during and/or analysed during the current study are not publicly available as they are not fully owned by the authors, but are available from the corresponding author on reasonable request.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommonshorg/licenses/by/4.0/.