1 Introduction and problem

In the Alps, backcountry skiing has become very popular in the last 50 years. Unfortunately, there are a lot of fatal accidents due to snow avalanches caused by skiers or snowboarders. They are of special public interest for various reasons.

In Austria, 79 avalanche accidents (17 fatalities) were reported during the winter of 2001/2002. 16 out of these 17 fatalities were caused by alpine skiers or snowboarders. By far, the highest number of accidents took place in Tyrol (2001/2002: 47 accidents/12 fatalities). It has been one aim from the very beginning to give guidelines for backcountry skiers in order to avoid avalanche accidents. This can be done by analyzing avalanche accidents empirically (using statistical methods). However, it is rather difficult to predict the risk (=probability) of avalanche events on a backcountry ski slope under given conditions. About 10 years ago, the mountain guide Werner Munter suggested a quantitative method to estimate the risk of avalanche events. Assuming that variables:

  • danger levels from the local avalanche information service (1 = low to 5 = very high);

  • incline of the slope (three classes from flat to steep);

  • aspect of the slope (north, south); and

  • skiers behaviour.

have an influence on the risk, he calculated a quantity which he called “remaining risk” (Munter 1997). On the basis of this quantity, he developed a strategy for backcountry skiers whether to go or not to go on a skiing tour (stop if the “remaining risk” is larger than 1). Plattner (2001) gives a concise description of his method (“reduction method”) and several other methods, which can be seen as modifications of Munter’s method. But as we showed in Pfeifer and Rothart (2002), Munter’s quantity cannot be understood as a probability for avalanche events. Moreover, there is no empirical evidence for his method because he did not take into account any skiing incidents without avalanche accidents. In various papers (e.g. Schweizer and Lütschg 2001) the variables above are considered to be the most important factors triggering avalanches. Besides “empirical” arguments based on avalanche accident data, snow experts also put forward snowpack properties (see Schweizer and Jamieson 2003).

There are various papers which try to explain the occurrence of avalanches on a given day using observations (days) with and without avalanches. They employ discriminant analysis, classification tree methods or neighbourhood methods, see e.g. Obled and Good (1980) and Mc Collister et al. (2003). But avalanche data of these studies are not restricted to avalanches triggered by skiers.

Our approach in Rothart and Pfeifer (2003) and Pfeifer and Rothart (2004) seems to be the first one where results on probabilities of avalanches triggered by skiers has been given. The model- and data-setting is close to what we present in the following. In 2006, Grimsdottir and McClung also published an article (Grimsdottir and McClung 2006) where results on avalanche trigger-probabilities are presented (using data of skiing incidents without avalanche accidents): The analysis is based on a database about terrain use and avalanche accidents from a large heli-skiing operator in Canada. The study of conditional probabilities of accidents given the recorded pattern of terrain usage shows that the probability of triggering an avalanche depends highly on the danger level, moderately on the elevation level (“alpine”, “treeline”, “subtree”) and less than widely believed on the aspect of the slope. However, Grimsdottir and McClung (2996) did not consider probabilities conditioned on the incline of the slope.

In this article, we propose a decision strategy for backcountry skiers based on probabilities of a logistic regression model using variables such as danger level, incline of the slope and aspect of the slope. Available information on frequencies of skiers on slopes under specific conditions is included in the model. Additionally a holdout validation is carried out in order to give an idea of the predictive power of our proposal. In Sect. 2, a description of the data for regression and validation (reports from the Tyrolean avalanche information center within five seasons) is given. Questions of the quality of data are considered. Section 3 describes methodological aspects of this article in order to obtain a strategy of decision. In Sect. 4, descriptive results, results of the logistic regression, the proposal for the strategy and results of the validation are given. Section 5 gives a discussion and a conclusion. Furthermore a comparison between our proposal and Munter’s method is made.

2 Data

Our aim is to relate counts of avalanche events to each class of incline and aspect for days with specified avalanche reports from the Tyrolean avalanche information service (Lawinenwarndienst Tirol). The period under observation are the winter seasons 1999/2000, 2000/2001 and 2001/2002 (497 days of observation, modelling or training sample) and the winter seasons 2002/2003 and 2003/2004 (314 days of observation, validation sample). The region of observation is restricted to the federal state of Tyrol. For each day:

  • danger level \({\tt LWS}\) (1 = low, 2 = moderate, 3 = considerable, 4 = high, 5 = very high);

  • weather and skiing conditions \({\tt TOURV}\) (1–3 = good–bad);

  • and weekend \({\tt WOENDE}\) (yes/no).

are recorded in database 1. These data are available from the annual report of the Tyrolean avalanche information service (Amt der Tiroler Landesregierung Lawinenwarndienst 2001–2005). The weather conditions are taken from the 24 h weather forecast in Tyrol (Alpinwetterbericht der ZAMG-Wetterdienststelle Tirol). We proceeded to divide the weather conditions and skiing conditions into three classes (1 = good, 2 = moderate, 3 = bad). We carried out this classification as described in Scheuermann (1990).

In database 2, we recorded the reported backcountry skiing avalanche accidents in the same region within the seasons (1999–2004). In addition to the variables of the first database we took into consideration:

  • incline of the slope (1: <35°, 2: 35°–39°, 3: ≥40°);

  • aspect of the slope (northern sector, southern sector).

The quality and homogeneity of the reports of avalanche accidents have increased during the last years. In order to check the reliability of the accident data, we made a cross-check with those reported in the annual report of the Kuratorium für Alpine Sicherheit (due to the avalanche reports of the Austrian Alpine Police). However, it seems that there are a lot of accidents which are not reported at all (probably those with no serious injuries). Unfortunately, we cannot really assess the influence of this selection bias on our results (see discussion in Schweizer and Lütschg 2001).

Database 1 is expanded into six terrain classes for every day (3 classes of incline and 2 classes of aspect) resulting in 4,866 records. The variable \({\tt ANZAHL}\) results from counting avalanche accidents in each data record (in each class of incline and aspect each day).

According to Grimsdottir and McClung (2006), it would be appropriate to take the elevation level into account. However, in our case it does not make sense to classify avalanche events into three classes (“alpine”, “treeline”, “subtree”), because about 85% of the observed avalanches are above treeline (2,000 m). Schweizer and Jamieson (2003) suggest that the influence of the elevation level on unstable snow profiles is smaller than the influence of aspect, incline of the slope or avalanche danger. There is some evidence that there are differences between avalanches in Europe and Canada with respect to the elevation level. Probably most backcountry skiers use slopes higher than the treeline (verification needed). As a result of this, we decided not to use the variable elevation level in our model.

In the following, the model is based on the probability of observing an accident in each class (incline, aspect) for a day characterized by danger level \({\tt LWS},\) weekend \({\tt WOENDE}\) and weather and skiing conditions \({\tt TOURV}.\)

3 Methods

Strictly speaking, our task consists in modelling the probability of reported backcountry skiing avalanche events for days with avalanche forecasts in every terrain class in Tyrol. For this purpose, we expanded database 1 as described above and counted the number of avalanche accidents for each data set. These database operations were done with the Mircosoft programme ‘MS Access’. First, a simple univariate descriptive statistics (frequencies) of the variables above is given for the first three seasons (1999–2002). We then proceed to establish a logistic regression model (no vs. one or more accidents as dependent variable), in order to estimate the probabilities p in question. Model selection leads to (see section results):

$$ \hbox{logit}({\mathbf{p}})={\tt LWS+NEIG+EXPOS+WOENDE+TOURV} $$

Besides danger level \({\tt LWS},\) incline of slope \({\tt NEIG}\) and aspect of slope \({\tt EXPOS},\) we took the qualitative variables skiing conditions \({\tt TOURV}\) and day of the week \({\tt WOENDE}\) into consideration. There is some evidence (Scheuermann 1990) that the frequency of skiers on the slope strongly depends on weather and snow conditions and on the days of the week (weekend, working days).

Finally, we try to establish a decision strategy for backcountry skiers based on empirical/statistical arguments: We have to fix a limit probability p* which represents the cut-point for the decision whether to stop or to go. If the probability triggering an avalanche is smaller than p*, then the backcountry skier would decide to go on the tour. If the probability is equal to or higher than p*, then the skiers decision should be to stop the tour. We choose this cut-point in such a way: For given p, calculate the 2 × 2 contingency table based on the variables avalanche occurrence yes/no and decision stop/go and quantify the dependence measure χ2(p). Identify p* in such a way that maximizes the function χ2(p), 0 ≤ p ≤ 1. As a result of this we get a cut-point p* where the dependence between the variables avalanche occurrence and decision is a maximum.

Two more seasons (2002–2004) are used for holdout validation: With the help of estimated parameters fitted to the training sample (1999–2002), we give predictions for the validation data. Comparing predictions and observed data in the training and validation sample gives an idea how well predictions are. Eventually, we compare our strategy with Munter’s reduction method. All the statistical calculations and graphics were done using the software package ‘R’.

4 Results

Figure 1 and Table 1 show relative frequencies (per cent) of avalanches yes/no for the training data depending on the danger level \({\tt LWS}.\) We notice that the occurrence of avalanches is more frequent if the danger level is higher (1 = low: 0.54%, 2 = moderate: 2.12%, 3 = considerable: 4.78%, 4 = high: 5.95%). The most frequent days are those with danger level 2 = moderate and 3 = considerable. There are no days with very high danger level. Moreover, there is a significant dependence between the danger level and the occurrence of avalanches (χ2 = 21.52, p = 0.0001). Generally, the occurrence of avalanches can be seen as an event which is rather rare.

Fig. 1
figure 1

Relative frequencies (%) of avalanches yes—red/no—green dependent on the danger level

Table 1 Cross-classification table of avalanches yes/no and danger level

Figure 2 and Table 2 show relative frequencies of avalanches depending on the incline of the slope. As expected, the frequency of avalanches increases with the incline of the slope (<35°: 0.60%, 35°–39°: 3.92%, ≥40°: 5.33%). There is a significant dependence between the two variables (χ2 = 36.86, p = 0.0). Due to the data setting, the incline is supposed to be equally distributed.

Fig. 2
figure 2

Relative frequencies (%) of avalanches yes—red/no—green dependent on the incline of the slope

Table 2 Cross-classification table of avalanches yes/no and incline of the slope

Figure 3 and Table 3 show relative frequencies of avalanches depending on the aspect of the slope (north, south). They are significantly higher (χ2 = 7.13, p = 0.008) on slopes in the northern sector than on slopes in the southern sector (north: 4.16%, south: 2.41%). Again the aspect is supposed to be equally distributed.

Fig. 3
figure 3

Relative frequencies (%) of avalanches yes—red/no—green dependent on the aspect of the slope

Table 3 Cross-classification table of avalanches yes/no and aspect of the slope

Figure 4 and Table 4 show the dependence of avalanche occurrences on the indicator weekend yes/no (χ2 = 2.84, p = 0.09). We notice that the frequency of avalanches is higher if the day of observation is in the weekend. It is expected that this is a result of the higher frequency of backcountry skiers on the weekend (during the week: 2.91%, weekend: 4.09%).

Fig. 4
figure 4

Relative frequencies (%) of avalanches yes—red/no—green dependent on the weekend yes/no

Table 4 Cross-classification table of avalanches yes/no and weekend yes/no

Figure 5 and Table 5 show the dependence of avalanche occurrences on the weather and skiing conditions (χ2 = 2.43, p = 0.2962). We can see that the frequency of avalanches is lower for bad weather and skiing conditions (good: 3.38%, moderate: 3.61%, bad: 2.36%). It seems to be obvious that this effect is due to lower frequencies of backcountry skiers on days with bad weather and skiing conditions. The most frequent days are those with moderate weather and skiing conditions. However, the indicator weekend yes/no and and weather and skiing conditions allow us to control the effects due to frequencies of backcountry skiers.

Fig. 5
figure 5

Relative frequencies (%) of avalanches yes—red/no—green dependent on skiing conditions

Table 5 Cross-classification table of avalanches yes/no and skiing conditions

4.1 Model

We fit a logistic regression model to data of 3 seasons (1999–2002) in order to give reliable avalanche occurrence probabilities. Due to simplicity and reliability, the additive model (without any interaction terms) turned out to be the most appropriate. The variables \({\tt WOENDE}\) and \({\tt TOURV}\) are taken into the model in order to control effects due to different backcountry skier frequencies. Table 6 shows the results for this model (see Sect. 3).

Table 6 Estimated parameters of the logistic regression model

As we can see, the danger level (LWS) and the incline of the slope (NEIG) have the strongest positive effects on the probability of an avalanche. Additionally, the influence of the aspect of the slope is also significant. Of course, any expert in avalanches would just expect dependencies of these type. Figure 6 shows the distribution of the predictions (=probabilities) of the logistic regression model and the vertical line indicates p* (=0.03), the probability, where the association between the stop/go decisions and the occurrence of avalanches is a maximum.

Fig. 6
figure 6

Histogram of estimated probabilities and cut-point for decision stop/go

As a result of this probability p* (cut-point between stop and go), we are able to suggest a decision strategy depending on the variables danger level, incline of the slope and aspect of the slope in Figs. 7 and 8. The rows represent three classes of the slopes incline and the columns represent five classes of danger level (for description, see Sect. 2). The meaning of the colour of the boxes is the following:

  • green (go): relative frequency of predicted cases where to stop is equal to zero;

    Fig. 7
    figure 7

    Decision strategy in the northern sector dependent on the danger level and the incline of the slope (go/green, yellow/attention and stop/red)

    Fig. 8
    figure 8

    Decision strategy in the southern sector dependent on the danger level and the incline of the slope (go/green, yellow/attention and stop/red)

  • yellow (attention): relative frequency of predicted cases where to stop is smaller than 50%;

  • red (stop): relative frequency of predicted cases where to stop is larger than 50%.

The colour yellow indicates the cases, where we suggest cautious behaviour in order to minimize the risk of an avalanche accident (small groups, large distance between skiers, using additional information of the avalanche report etc.). Figures 7 and 8 give the proposal distinguishing between slopes in the north and the south.

In case of slopes in the northern sector:

  • and danger level 1 (low) pay attention at slopes steeper than 39°;

  • and danger level 2 (moderate) stop at slopes steeper than 39° and pay attention at slopes steeper than 34°;

  • and danger level 3 (considerable) stop at slopes steeper than 34° and pay attention at slopes steeper than 29°;

  • and danger level 4 (high) avoid backcountry skiing in general.

In case of slopes in the southern sector:

  • and danger level 2 (moderate) pay attention at slopes steeper than 39°;

  • and danger level 3 (considerable) stop at slopes steeper than 39° and pay attention at slopes steeper than 34°;

  • and danger level 4 (high) stop at slopes steeper than 34° and pay attention at slopes steeper than 29°;

  • and danger level 5 (very high) avoid backcountry skiing in general.

In general, the relative frequencies of predicted cases where to stop for yellow boxes (<50%) are smaller in the northern sector than in the southern sector. Especially, we notice the highest relative frequency (29%) on slopes in the southern sector up to an angle of 34° at danger level 4 (high). As a consequence of this, one has to be more careful in yellow coloured classes in the southern sector than in those in the northern sector. Furthermore, one has to pay special attention on slopes at danger level 4 (high). Moreover, we would recommend to use Fig. 7 instead of Fig. 8 for decision, if the snow is wet (e.g. in spring in the afternoon).

For validation purposes, we compare observed and predicted counts of the training sample (Fig. 9) and the validation sample (Fig. 10). Black bars indicate predicted counts and red bars indicate corresponding observed counts of the logistic regression. There are small predicted counts on the left going up to large predicted counts on the right. As expected, the predictions of the validation data show poorer fit compared to the predictions of the initial data. We observe larger observed counts for the validation data in the middle range. Looking at the raw data, we can see that this is due to an increased rate of avalanche accidents at danger level 2 in the validation data.

Fig. 9
figure 9

Barplot of predicted (black) and corresponding observed counts (red) for training data

Fig. 10
figure 10

Barplot of predicted (black) and corresponding observed counts (red) for validation data

Choosing p* = 0.03 for decision between stop and go, we are able to compare avalanche events (yes/no) and decision (stop/go) in a 2 × 2 contingency table for both the training data and the validation data. If we calculate dependence measures for the modelling case (χ2 = 67.72) and for the validation case (χ2 = 31.05), then we can see that the χ2-value based on the training data is more than twice as high as the χ2-value based on the validation data (however, both p-values are <0.05). Comparing the contingency tables, we take notice of a larger false negative rate (avalanche yes/decision go) for the validation data. Looking at the raw data, we notice an increased rate of avalanche accidents at danger level 2 and incline level 2 (35°–39°) for the validation sample. Apart from avalanche data at danger level 2 as observed above, the predictions of the validation data seem to be quite reasonable.

5 Summary and conclusion

Probabilities of avalanche occurrences were calculated using a logistic regression model. As we can see in Table 6, the danger level and the incline of the slope have a positive effect on these probabilities. Furthermore, it is more likely that an avalanche incident occurs in the northern sector than in the southern sector. The variables skiing conditions and weekend yes/no were used to take into account the effects of frequencies of skiers on slopes. Eventually, the probabilities based on this logistic regression model result in a decision strategy depending on danger level, aspect and incline of the slope, such as described in Figs. 7 and 8. Red coloured boxes indicate classes of slopes where we suggest to stop (number of predicted cases larger than 50%). Yellow coloured boxes indicate classes of slopes where we suggest to pay additional attention (number of predicted cases smaller than 50%). Especially at yellow coloured classes in the southern sector, we recommend to be careful. Boxes of green colour indicate classes which are in our opinion of minor risk (number of predicted cases equal to zero).

In the southern sector, green boxes are equal to the elementary reduction method (at danger level 2 go at slopes not steeper than 39°; at danger level 3 go at slopes not steeper 34° and at danger level 4 go at slopes not steeper than 29°—see Munter 1997). In the northern sector, green and yellow boxes are equal to the elementary reduction method. The reduction method in general such as described in Plattner (2001) takes additional variables, such as aspect of the slope (“second class reduction factors”) and behavioural considerations (“third class reduction factors”) into account.

Generally speaking, Munter’s method is in agreement with our proposal if we use second class reduction factors (aspect of the slope) and third class reduction factors (behavioural considerations). Avoiding slopes in the northern sector (value of the reduction factor: 2) can be considered as shift of the green and yellow boxes in Fig. 7 to the right. And as we mentioned before, third class reduction factors can be seen as decision options in yellow coloured classes. Despite of the empirical deficiency of Munter’s reduction method, we were able to find a remarkable correlation with our proposal.

The holdout validation showed that the predictions of the validation sample are quite reasonable with the exception of avalanche cases with danger level 2 and incline level 2. There are higher rates for this cases in the validation sample than expected from the training sample. This is remarkable and we should keep an eye on more recent data in order to clear up this disagreement.

One could argue that there are effects of skiers frequency on the probability of avalanches apart from \({\tt WOENDE}\) and \({\tt TOURV}.\) We think that the frequency of skiers mainly depends on \({\tt WOENDE}\) and \({\tt TOURV}\) and not so on aspect, incline and avalanche danger. Thus the influence can be modelled by this two variables without remaining reasonable effects (of skiers frequency) on the probability of avalanche events. However this has to be verified and we recommend to do additional research (count data based on a random sample) in order to get more precise information on the frequencies of backcountry skiers on slopes. Contrary to the study of Grimsdottir and McClung (2006), it is necessary to take into account the incline of the slope.