Keywords

1 Introduction

Surf has been increasing its popularity worldwide, for both competitive and recreational levels. Minimal training and equipment makes this an appealing water sport for everyone [20]. Paddle, stationary, wave riding and some miscellaneous events are the four main activities of a surf session [20, 21]. Despite the exponential increase in the field of sports trackers, there have not been significant developments in the specific surf area [13]. Although performing manoeuvres is the ultimate goal in surfing, there are few solutions available for surfers’ assistance, especially concerning the characterization of rotational movements and performed manoeuvres.

Tools that can track and measure surfers’ progress over time are quite appealing and needed for surf practitioners, specially when no external guidance is provided [11, 20]. However, the analysis of the events that occur during wave riding periods can be a challenge. Moreover, even if a great part of surf session analytics (e.g. paddle duration, wave counting) are currently being performed by some commercial solutions, these still lack manoeuvre identification and evaluation, which is a relatively new and unexplored domain [11, 20].

Most of these surf monitoring systems generate biofeedback based on sensor data retrieved during the training session, and are capable of providing some additional information about the executed movements [9, 11, 19]. Currently, most of these solutions can only count the number of waves and paddle time, estimate speed, distance and movements pattern during the session using Global Positioning System (GPS) and/or Inertial Measurement Units (IMU) measurements [1, 2, 10, 13]. However, they lack detail during the most important surf event – wave riding [11]. Moreover, and specifically for manoeuvre detection purposes, there is a low number of available studies and real-world validations [13, 25]. As such, there is a gap in the market concerning surf manoeuvre detection.

Surf manoeuvres may be difficult to distinguish, especially for non-experts [7, 26]. Thus, the automatic analysis of these events may be considered a difficult task. Differences between two manoeuvres are often only evident in the “dynamics” or “elegance” in their execution. Even in surf competitions, the evaluation process made by the judges is usually subjective and based on the average of their opinions, highlighting even more the importance of having a solution capable of thorough identification, characterisation and evaluation all of the performed manoeuvres [7].

This context motivated the development of a new algorithm for the detection and characterisation of the performed manoeuvres during wave riding periods in real-world conditions, solely resorting to the inertial sensors embedded in a smartphone. In this sense, this manuscript proposes a novel approach for the segmentation and identification of the performed manoeuvres for moderate to experienced surfers.

The remainder of the paper is organised as follows: Sect. 2 describes prior work conducted in this field; Sect. 3 describes the dataset and proposed methodology of the study; Sects. 4 and 5 report and discuss the main findings of this study, respectively; Sect. 6 highlights the main conclusions and points out possible directions for future work.

2 Related Work

There are only a few commercially available solutions for surf monitoring purposes, and even less if we only consider manoeuvre detection and analysis. Most of the current commercially available solutions are only based in GPS data. Thus, these solutions can only extract top-level information about the surf session, such as wave count, travelled distance or wave speed. Detecting more complex surfing movements, such as in-wave manoeuvres, is hard or even impossible to perform with this type of solutions given the single source of available data. One example of this type of commercially available solutions is the Rip Curl Search GPS watch. Additionally, some mobile applications have also been developed such as Surf Track [3], Dawn Patrol [5] and WavesTracker [4], but very little information regarding their functioning, validity or system setup is provided.

The combined use of GPS and inertial sensor data, i.e. accelerometer, gyroscope and magnetometer sensors, has been widely reported in the literature to effectively increase overall robustness and precision of human activity recognition applications [13]. Therefore, other solutions that make use both of GPS and inertial sensor data, such as the GlassyPro wristband [1] or Xensr Air [6] surfboard-mounting device, are some of the commercially available solutions that currently combine several sources of data. However, they may not be entirely suitable for manoeuvre detection purposes. Wrist-worn devices, like the GlassyPro, may be practical and easy to use, but due to their positioning on the body, may be insufficient for manoeuvre detection and characterization purposes. Torso and board rotations are important metrics for wave performance analysis and extremely important when trying to identify and evaluate certain manoeuvres, and wrist-mounted sensors may not be suitable for this task. There are also devices that were designed to be mounted in the surfboard, like Xensr Air and Trace Up. However, to the best of the authors’ knowledge, these devices are currently not available for purchase. Moreover, these can be associated with some safety issues due to its size and attachment to the surfboard, and they do not track the surfer’s actual movements.

Besides the commercially available solutions, some research studies were also conducted, aiming surf monitoring and/or performance analysis. Madureira et al. [18] proposed an algorithm comparing the use of GPS sensor alone and together with inertial sensors data for wave detection. Similarly, Hoettinger et al. [14] proposed a machine learning based approach for differentiating wave from non-wave events, using sliding windows of 2.0 s with 75% overlap, also achieving accurate results. However, none of these studies performed any type of analysis related to manoeuvre identification or characterisation, highlighting the novelty of this study.

3 Methodology

3.1 Data Annotation

The dataset reported in [13] was used to conduct this study, featuring raw data from the accelerometer, gyroscope and magnetometer during all of the recorded surf sessions. This dataset had already been annotated for several surf events, including wave riding periods. We selected the sessions of the 5 advanced-level surfers in the dataset, since appropriately performing manoeuvres during wave riding is a demanding task in the sport, characteristic of experienced surfers (Fig. 1).

Fig. 1.
figure 1

Snapshot frames of the execution of the detected manoeuvres: snap (top) and cutback (bottom).

A dual-source data annotation tool enabling synchronized signal and video visualization was created to finely annotate manoeuvre periods in each wave. Manoeuvre annotation was a challenging process, due to the similarity between some manoeuvres, the particular technique of each surfer, and lack of clear view of the surfer in some situations (e.g. agitated sea, distance). Moreover, annotations were performed by non-specialists in this area and therefore, can be subjective and prone to human error. To minimize its influence, we opted not to consider events in which the annotator was unsure of the label to assign or the temporal limits of the movement, and unsuccessful manoeuvres, i.e. ending in a fall.

Table 1. Dataset description.

Table 1 provides an overview of the final dataset with annotated manoeuvres. The most represented manoeuvres were cutback and snap. A snap consists in a radical change of trajectory in the pocket or on the top of the wave; when performed abruptly, it produces spectacular and flashy buckets of spray. A cutback consists in riding up the wave shoulder, turning back towards the breaking part of the wave without losing speed and ending up with a re-entry in the critical section of the wave.

3.2 Data Processing

Figure 2 presents a graphical overview of the processing steps used in this study. A thorough explanation of each of these steps is provided in the following subsections.

Fig. 2.
figure 2

Data processing operations overview.

Data Stream Segmentation. An overlapping sliding window approach was implemented for data stream segmentation (Fig. 3). Since the selection of window size may be considered an empirical and task-oriented problem, highly related with the duration of the event of interest and/or the maximum expected recognition latency, different window sizes were evaluated, considering the distribution of manoeuvre duration in our dataset, presented in Table 1: 2.0, 2.5, 3.0, and 3.5 s, with fixed overlap of 25%. Each time-window with over 70% of match with a certain annotated manoeuvre was assigned a positive class label. All remaining samples were assigned a negative class label.

Annotated ground truth manoeuvres’ windows were also considered for the pipeline optimization step.

Fig. 3.
figure 3

Example of two consecutive sliding windows.

Feature Extraction. A set of generic and domain-specific features were extracted from each of the 3-axis orientation components (heading, pitch, roll) obtained after sensor fusion using the gradient descent-based algorithm of [17], the magnitude of the 3-axis linear acceleration, and the magnitude of the XY components of the linear acceleration for each time-window.

Time-domain features such as minimum, maximum, variance, skewness, kurtosis, mean cross ratio, waveform length [16] and interquartile range values were calculated for each time window.

Features based on the wavelet transformation of the input signals were also calculated. Wavelet transform decomposes a signal according to the frequency, representing the frequency distribution in the time domain [22]. We used the implementation of the Wavelet Packet Transform of the PyWt library [15], with Daubechies 2 (db2) as mother wavelet and a maximum decomposition level of 3, for performing a time-frequency analysis of the linear acceleration and orientation signals, respectively. For each resolution level, the relative wavelet energy was calculated for each associated frequency band [23, 24]. A metric based on the statistical variance of the wavelet coefficients was also calculated for each resolution level and associated frequency bands [27].

Fig. 4.
figure 4

Heading evolution over time for a wave riding period with several annotated manoeuvres (top) and a manoeuvre time-window with local extrema detection (bottom).

A set of other features which intended to describe the rotational dynamics of the movements were also extracted from the heading orientation signal. All local extremas were identified to generate minima-maxima-minima and minima-maxima-minima-maxima sequences (Fig. 4). These domain-specific features consisted in extracting the sum, average and maximum angular displacement and velocity values for each min-max/max-min sequence.

The feature extraction step resulted in a total of 199 features for each time window.

Pipeline Optimization. A domain-specific resampling strategy was implemented to handle dataset imbalance, since the number of instances from the positive class was always inferior to the number of instances from the negative class for all tested window sizes. This strategy consisted in discarding non-manoeuvre segments with the highest percentages of manoeuvre match successively until the classes were balanced.

The optimization and selection of the learning pipeline was performed using a tool called Feature-based Machine Learning (FbML), created at Fraunhofer AICOS. This tool is based on the open-source project auto-sklearn [12], and allows a search space initialization via meta-learning (search similar datasets and initialize hyper-parameter optimization algorithm with the found configuration) while providing a vast list of options for data pre-processing (balancing, imputation of missing values, re-scaling), feature transformation, and feature and classifier selection. As such, we explored pipelines generated with the following combinations of methods:

  1. 1.

    Scalers: Standardization (zero mean and unit variance); Min-Max Scaling; Normalization to unit length; Robust Scaler; Quantile Transformer; None.

  2. 2.

    Feature Transformation/Selection: Principal component analysis (PCA); Univariate Feature Selection; Classification Based Selection (Extremely Randomized Trees and L1-regularized Linear SVM); None.

  3. 3.

    Classifiers: Gaussian Naive Bayes; K-Nearest Neighbors; Linear and Non-linear Support Vector Machines; Decision Trees; Random Forest; Adaboost.

  4. 4.

    Validation Strategy: 10-Fold Cross Validation.

  5. 5.

    Optimization Metric: F1-score.

At each new test, the results and parameters of the 5 best classification pipelines were stored for further evaluation.

Leave-One-Session-Out Validation. In order to study model generalization for different acquisition conditions (i.e. sessions), we implemented a leave-one-session-out validation approach. At each iteration i, where \(i \in [1, N]\) and N represents the total number of different sessions, all of the instances from the surf session \(S_i\) were selected for testing while the remaining ones were used for re-fitting the pipeline. The pipeline was defined by the best combination of methods and hyperparameters which resulted from the FbML optimization with cross-validation.

While this approach allows us to assess the performance of the method under session-independent conditions, it will still not be enough to fully assess performance in real-world conditions, since consecutive windows containing data from a same manoeuvre should be merged in order to deliver a proper count and useful information to the user. As such, we created a methodology (post-processing step of Fig. 2) which merges consecutive positive predictions in pairs, setting them to correspond to a same manoeuvre M. If, after two windows are already merged, a new and single positive prediction occurs, this window may also be considered to belong to M if its classification probability is greater than that of the previous window. Otherwise, it will be set to correspond to a new manoeuvre.

Manoeuvre detection results were computed considering the predictions per window (for selection of the best window size and an overall assessment of pipelines’ performance) and after considering the post-processing needed for utilization in real-world conditions (predictions per manoeuvre), as Fig. 2 indicates.

4 Results

Table 2 combines the results of the best set of pipelines generated by the FbML after the leave-one-session-out validation considering the predictions per window, for different window sizes with a fixed overlap of 25%. These results support that the best overall performance was attained with windows of 2 s, associated with a F1-score (optimization metric) of 0.91. This segmentation approach was thus selected and used in all further experiments.

Table 2. Average performance of the top 5 classification pipelines obtained by the FbML considering the predictions per window, using leave-one-session-out validation.

Table 3 exhibits the final manoeuvre detection results, after the post-processing step. True positives (TP) correspond to detected manoeuvres’ time segments which overlap with annotated manoeuvres’ periods. False positives (FP) correspond to detected manoeuvres’ periods which do not overlap with annotated manoeuvres. False negatives (FN) correspond to annotated manoeuvres which were not detected (do not overlap) with any positive prediction segment.

Table 3. Manoeuvre detection performance after application of the time-windows’ merging criteria (real-world conditions) for each of the top 5 pipelines derived from the FbML optimization process.

All of the top 5 pipelines derived from the FbML optimization process relied on an Adaboost classifier, and performed a feature scaling step. The pipelines which implemented univariate feature selection are associated with the highest F1-scores (0.88). All pipelines were able to detect at least 95% of the annotated manoeuvres, despite the demanding circumstances under which the tests took place, since session-independence was preserved. The impact of false positives was also taken into consideration: the lowest precision was 0.76.

5 Discussion of Results

The results reported in Table 2 enabled a reasoned selection of the best segmentation approach and an overall understanding of the performance of the method. Our results support that, out of the experimented window sizes, windows of 2 s are the most appropriate for manoeuvre segmentation and classification. Windows of 2 s led to the highest number of samples in the dataset. Moreover, considering the average manoeuvre duration in our dataset, it is reasonable to conclude that keeping segments of 2 s and an overlap of 25% between consecutive samples guarantees that most manoeuvre periods are contained in 1 window or 2 consecutive windows. This approach was, thus, selected as segmentation method.

Overall, the results obtained using the predictions per window were considered very promising towards adequate manoeuvre detection using our method. However, the performance metrics exhibited in Table 3 are the most critical to understand if the method generates reliable and intelligible information for the surfer in terms of manoeuvre detection in real-world conditions.

We were able to correctly identify over 95% of all annotated manoeuvres in the dataset, while achieving a precision of up to 80%. False positive occurrences were mostly related with the following situations: 1) finishing wave riding with a failed manoeuvre, ending in a fall; 2) segments which most likely corresponded to manoeuvres, but corresponding to times when the annotator did not have clear sight of the surfer; 3) conservative annotation process which only considered periods of absolute certainty as ground truth manoeuvres. Optimizing the trade-off between false positive and negative predictions is a well-known challenge of machine learning problems. As such, and considering the aforementioned situations, we consider that the reported results are appropriate and support the adequate performance of the method in real-world conditions.

Another important detail of this study is that it maintained a session-independent validation approach. Thus, the attained results support the appropriate generalization of the method for different users and different acquisition conditions, including sea level and agitation variation and slightly different positioning of the smartphone in the users’ back.

5.1 Challenges and Limitations of the Study

Despite the comprehensive amount of collected sessions available in the dataset of [13], we were only able to use the sessions from 5 surfers, since these were the only users with the necessary level of expertise to perform in-wave manoeuvres. Our dataset was finally mainly composed of cutback and snap manoeuvres, with very little representation of the remaining ones. This can be considered a limitation of the study, as there is no certainty of the performance of the method for unseen manoeuvres.

Another limitation is related with the fine time limits for ground truth manoeuvre annotation, and the challenges of the annotation process (discussed above), which may impair a full reliable quantification of the performance of the method [8]. A second annotation round with surf experts may be an adequate approach to tackle this limitation, followed by a comparative analysis of the expected improvements.

The labelling criteria for each time-window may also be a source of error of the method, as the definition of these criteria was empirically performed. It would also be interesting to implement and test dynamic data stream segmentation techniques, and assess if it would be possible to achieve improved fits of the generated time segments with annotated manoeuvre periods using such techniques.

6 Conclusions

This manuscript details the development of a surf manoeuvre detection algorithm, using data from the smartphone’s inertial sensors and a machine learning pipeline optimized for the problem in hands. Several time-window sizes were tested, and windows of 2 s with 25% overlap delivered the best results. Manoeuvres were detected with up to 88% F1-score under our real-world conditions validation, which is very promising for a real-world application and should have a high market potential.

As future work, we intend to combine the outcome of this study with the work of [13] to create a full surf monitoring solution which simultaneously detects surf session events (namely, waves), and further segments these periods to deliver more performance metrics to the surfer concerning fine events, i.e. in-wave manoeuvres.