1 Introduction

Broadly in sports science, and particularly in rugby league football, traditional (i.e. physical and technical–tactical) performance indicators that are used to characterise both men’s and women’s match games [1, 2] are widely used as variables for various performance analysis problems. In [3], cluster analysis was primarily carried out on data characterised by traditional indicators (e.g. high-speed running distance and total tackles) to determine the similarity between playing positions (i.e. fullbacks, centres, half-backs, wingers, hookers, prop-forwards, second-rows and loose-forward) of professional rugby league. For each data characterised by technical, physical and combined indicators [3], various cluster sizes (i.e. 3, 4 and 3) were later identified, respectively, where players of different playing positions were grouped into the clusters, based on top-ranked important indicators. This phenomenon is in tandem with the ground truth [4] that broad positional groups, i.e. forwards and backs (comprising distinct playing positions), exist in professional rugby and that players of some distinct playing positions perform similar activities. Although broad positional groups are known to exist in professional rugby league, the classification of players into these groups remains unexplored, while other classification problems [5,6,7] such as injury (status or type) prediction and competition levels classification are thoroughly researched.

Data characterised by traditional indicators can classify players into positional groups. However, insight into similar movement activities performed by players of different playing positions within and between positional groups will be inaccessible [8]. This limitation arises because physical performance indicators, primarily derived from wearable Global Positioning Systems (GPS) devices [9], offer little to no context or explanation on how players accumulate measured physical activities. Physical performance indicators (e.g. total distance covered, maximum momentum and high-speed running distance) [10] are often aggregated or reported in volume and without regard to the order of match activities or events occurrences. Player movement profiling [8, 10] now offers alternative indicators (i.e. movement patterns—also extracted from GPS data) that capture sequential match activities completed by players, which provide the context and explanation lacking in traditional indicators.

In this study, we propose the use of data characterised by movement patterns for elite rugby league players’ positional group classification as well as for uncovering the movement activities of players within the same positional group. This will assist in identifying sets of match-based sequential movement activities performed by players of both groups (which is useful for talent identification, training design and load management [8, 10]), measure the extent to which exact match-based movement activities can separate players into rugby league positional groups, establish the on-field movement activities of players required to profile the performance of players in each group and help develop predictive models to classify new players into each group among others benefits.

Therefore, using data characterised by movement patterns, this study aims to find the best machine learning model to classify elite rugby league players into forwards and backs, identify the key movement patterns necessary for such classification, uncover the similar movement activities of different playing positions within a positional group and quantify the difference in players’ performance of key movement activities between positional groups.

This paper is organised with Sect. 2 providing background and related works; Sect. 3 gives details about GPS data conversion to sets of discrete movement sequences, movement pattern extraction, classification data generation and data analyses. Section 4 presents the result of GPS data conversion and data analysis. Finally, in Sect. 5, the results are discussed from the implementation of the proposed framework and Sect. 6 concludes the study with relevant future works.

2 Related works

Traditional performance indicators are widely used to classify players into different sporting aspects but related studies for rugby positional group classification are scarce. In [2], 82 elite rugby league players of both forward and back positional groups were classified into competition levels (i.e. under-19 Academy and European Super League (ESL)) using 157 physical and technical–tactical performance indicators. Forwards were classified into competition levels at 68% accuracy, while backs were classified at 83% accuracy [2] by the random forest classification models. These physical, technical and tactical performance indicators are limited [8, 11] such that they are aggregated series of completed activities reported in arbitrary volume(s), e.g. total distance covered, average match-speed, line break and defensive play-the-ball loss or win.

In a more related study [12], 211 elite junior Australian footballers were classified into positional groups (i.e. forward, defence, ruck or midfield) using 12 technical skill performance indicators. Most players were classified as midfielders by the linear discriminant analysis (LDA) model with an accuracy of 56.8% among others. The use of physical and technical–tactical indicators for competition classification [2] and only technical skill indicators for positional group classification[12] produced a suboptimal classification model, especially for positional group classification.

Besides the classification of sporting aspects, the identification of key independent variables in sports science [2, 13] is often carried out as a solution to overcome the high dimensional and multi-collinearity of independent variables. Few key variables (performance indicators in most cases) are often identified as needed for practical utility. For instance, study [14] used a repeated-measures analysis of variance to identify key variables in its assessment of the difference in external loads of two competitive levels (i.e. the National Rugby League and National Youth Competition) of rugby league players. However, this method of multiple univariate analyses of independent variables is computationally expensive as multiple analyses are executed.

In [2], the machine learning algorithm variable importance method was used to identify and select the independent variables that are important to the predictive model for the initial classification of players into competition levels. The limitation of this method includes the identification of independent variables that are suboptimal [15] and biased (specific) towards the underlying machine learning algorithm. The application of feature selection techniques [3], especially the correlation-based filter feature subset-selection method [13], has proven to be an alternative method for identifying key independent variables in sports analytics.

In [13], the correlation-based filter feature subset-selection method was used to optimise [2] classification analysis and also to identify key physical and technical-tactical indicators for classifying rugby league players into competition levels. The identified key physical and technical–tactical indicators are more useful practically than all indicators, and they also produced an increased accuracy for competition levels classification (i.e. 84.55% from 82% for forwards and 77.42% from 67.5% for backs). Additionally, based on the identified key physical and technical–tactical indicators, the difference between competition levels was assessed [13] using Cohen’s effect size.

Due to traditional indicators’ limitations [8], movement patterns emerged as alternative indicators providing the exact match-based sequential locomotive activities of players. Recently, a sequential movement pattern-mining (SMP) framework was developed for player movement profiling [11] and validated on rugby league GPS data. The framework was reported to be the better, more stable and more robust method to discretise GPS data into movement units, form (discrete) movement sequences and identify the longest common movement patterns of elite rugby league players within twenty-five clusters, respectively. The SMP framework was used by another study [16] to explore the differences in the movement patterns of elite rugby league players among three competition levels (i.e. Super League regular, Super League (semi-) finals and international rugby league matches). However, the study found no difference in the movement patterns of players among the competition levels as extracted by the SMP framework. But a further analysis, using a three-stepwise linear discriminant analysis process for dimension reduction and classification, revealed an 81% accuracy for classifying elite rugby league players into competition levels.

In [17], three distinct pattern mining algorithms (i.e. the Longest Common Sequence algorithm of the SMP framework [11], l-length closed contiguous sequential pattern mining algorithm (LCCspm) [10] and AprioriClose [18]) were used to extract different types of obtainable movement patterns to profile and classify rugby league players into two distinct playing positions (i.e. hookers and wingers) as a means to identify and thus validate the best pattern mining algorithm (and type of movement pattern) for player movement profiling. The results of [17] revealed closed contiguous movement patterns extracted by LCCspm as the best type of movement pattern for profiling and classifying elite rugby league players into selected tactical roles at 91.02% accuracy based on a multi-layered perceptron model.

Overall, existing studies provided various machine learning algorithms useful for classifying sporting aspects and accuracy as a go-to performance evaluation metric. It also revealed the extent movement patterns are being applied within sports analytics, particularly in rugby league, which is for competition level and two distinct playing position classifications. However, the use of movement patterns for classifying elite rugby players into forward and back positional groups has not been previously explored despite its potential usefulness in practice. Also, the identification of the key movement patterns required for classifying players into positional groups is expedient. To identify key independent variables, the use of a feature selection algorithm is most suitable and the use of classification model(s) feature importance should be avoided. Also, Cohen’s effect size can be utilised to assess the difference(s) between groups characterised by the same variables. Based on all findings, an experimental framework that addresses the identified revelations is proposed in Sect. 3.

3 Method

To understand the pattern mining process, basic definitions of terms are presented. A movement unit, denoted by an alphabet, encodes a combination of velocity, acceleration and turning angle labels. A movement sequence is the concatenation of match-based movement units extracted from GPS. Sets of discrete movement sequences are subsequences of a movement sequence without players’ periods of inactivity. Movement patterns are frequent and consecutive movement units extracted from sets of movement sequences. Movement activities are decoded movement patterns.

3.1 Overview

The experimental framework depicted in Fig. 1 illustrates an overview of data collection, processing and analysis. To achieve the aim of this study, this study utilised the SMP framework [11] to generate sets of discrete movement sequences, used LCCspm algorithm [10] to extract movement patterns and conducted two phases of data analyses. The first phase of data analysis involved the creation of classification input data using all unique movement patterns as independent variables and finding the most accurate model for positional group classification. The second phase involved identifying key movement patterns for classification, re-training models for positional group classification, comparative analysis of classification models using all unique or key movement patterns, and assessment of differences between players of both positional groups.

Fig. 1
figure 1

Experimental framework

3.2 GPS data conversion to sets of discrete movement sequences

Tracking data of elite male rugby league players that participated in The Rugby Football League (RFL) competitions over two seasons (i.e. 2019 and 2020) per player per fixture were used. Global Position System [GPS] micro-sensor units (Optimeye S5, Catapult Sports, Melbourne, Australia) were used to collect standardised and structured data at 10 Hz across fixtures. The method published by [11] was followed to process the GPS data of elite rugby league players (i.e. 239 forwards and 203 backs) from 11 teams (i.e. Huddersfield Giants, Hull FC, Warrington Wolves, Leeds Rhinos, Hull Kingston Rovers, Wakefield Trinity, Salford Red Devils, Wigan Warriors, St. Helens, Castleford Tigers and Catalans Dragons) that participated in 349 fixtures into discrete movement sequences. Elite rugby league forwards consist of players that participated in any fixture as a prop-forward, hooker, second-row and lock-forward/loose-forward. Elite rugby league backs consist of players that played in any fixture as a fullback, centre, five-eighth, half-back and winger.

To generate sets of movement sequences, per player per fixture, players’ movement units were first derived through the discretisation of velocity, acceleration and turning angle values into qualitative locomotive labels based on some thresholds published by White et al. [11] SMP framework. Velocity locomotive labels are: walk, jog, run and sprint. Acceleration locomotive labels are: acceleration, neutral and deceleration. Turning angles locomotive labels are: straight, acute-change, large-change and backwards. Hence, movement units are combinations of all (i.e. \(4 \times 4 \times 3 = 48\)) possible locomotive labels encoded by lower- and uppercase alphabets (see Table 3 for examples). For each player per fixture, GPS data are processed into a sequence of movement units (e.g. uuvvuuSSHGHopdK) and later split into subsequences (i.e. a set of discrete movement sequences) after removing movement units representing the time of locomotive inactivity on the field (i.e. velocity is less than 1.2 m/s). Each set of discrete movement sequences was generated at player-per-fixture granularity. A total of 10,811 sets of discrete movement sequences were generated for both elite rugby league forwards (\(n= 5616\)) and league backs (\(n=5195\)).

Table 1 Performance metrics extracted from the confusion matrix

3.3 Movement pattern extraction and classification input data

The l-length closed contiguous pattern mining algorithm (i.e. LCCspm) [10] was applied to extract closed contiguous movement patterns from each set of generated movement sequences for both rugby league positional groups. The l parameter of the LCCspm algorithm was set to 20 (i.e. 2 s timeframe) to ensure long closed contiguous movement activities of players are extracted. The support parameter of the LCCspm algorithm was set to 5% to ensure a large number of frequent movement patterns can be identified as a higher support value will identify few frequent movement patterns.

To generate the dataset for classification modelling, the unique movement patterns extracted across all sets of discrete movement sequences were identified by finding the union of all extracted movement patterns. This set of unique movement patterns was used as independent variables for the classification modelling input data. The values of each unique movement pattern in the dataset are “x” or “0”, where “x” is the relative frequency of the performed movement pattern within a fixture and 0 if the movement pattern was not performed. This classification input data are referred to as “full dataset”.

3.4 Data analyses phases

The framework shown in Fig. 1 captures the two phases of data analysis conducted. In phase I, machine learning classification algorithms were utilised to develop models for classifying rugby league players into groups, to measure how well movement patterns can classify players into groups and find the best model. Six classification algorithms (i.e. decision tree, Naïve Bayes, random forest, logistic regression, multi-layered perceptron and k-nearest neighbours) were chosen based on their distinct learning method as well as for identifying the best model through comparison of performance. The default parameters of each algorithm were used except for the k parameter of the k-nearest neighbour algorithm which was set to 5 and MLP whose hidden layer sizes were set to 1000, 500, 250, 125, 50 and 25.

The classification modelling task is to predict whether an elite rugby league player (per fixture) is a forward or back. Using the data generated in Sect. 3.3, classification models were developed via tenfold cross-validation (Fig. 1). The tenfold cross-validation technique [19] splits data into ten partitions, and nine partitions are used for training models, while the remaining partition is utilised for testing. It is repeated 10 times until each partition is used for testing the models. The result of all cross-validated models is aggregated during performance evaluation. Classification models are evaluated [20] using the following evaluation metrics: accuracy, precision, recall and F1-score (Table 1).

In phase II, key movement patterns necessary for the classification of players into positional groups were identified. From Sect. 2, the “Correlation-based feature subset” (Cfs) feature selection method was identified as a useful feature selection method in sports analytics (See [13] for details of the algorithm). It was selected and applied to the dataset containing all unique movement patterns as independent variables to identify key (movement patterns) variables. Another dataset was extracted based on the outputted movement patterns subset, identified through this process and referred to as the “reduced dataset”, which was used for re-classification modelling.

Upon identifying the key movement patterns, further analysis that assessed if there were significant differences between players of both positional groups when performing those patterns across matches was executed. The analysis was carried out using the following Cohen’s effect size thresholds: trivial (\(\le \) 0.1), small (\(\le \) 0.2), moderate (\(\le \) 0.6), large (\(\le \) 1.2), very large (\(\le \) 2.0), nearly perfect (\(\le \) 4.0) and perfect (4.0). Also, the percentage average of identified key movement patterns performed by players per positional group was assessed and presented.

The analysis for the application of LCCspm for extraction of movement patterns, finding the unique set of patterns, classification modelling and statistical analysis were carried out using the Python programming language on a private cloud Linux-based server. The source can be found on GitHub [21]. The libraries (i.e. Scikit Learn and LCCspm) were imported and utilised. Feature selection was conducted using Waikato Environment for Knowledge Analysis (WEKA) GUI software version 3.8.5 [22] on a private cloud Linux-based server.

4 Results

The collected GPS data are converted into 10,811 sets of movement sequences for both elite rugby league forwards and backs players. An example of the result of such conversion is presented in the appendix (Table 5). The movement sequence derived from Table 5 is “jijvvuuuuuGGGHHHzmnnnmmononbabfbrvuvwvm” after removing periods of inactivity (i.e. “eeeee”).

Table 2 Performance of all classification models for forwards and backs using all unique movement patterns

LCCspm extracted a total of 10,149 movement patterns performed by forward rugby players of which only 6476 were uniquely performed by players of the group. An example of such movement patterns is “SKKKKLLyyyzTSPTMyzyC” (denoted as Sprint-Acceleration-Straight, [Sprint-Deceleration-Straight]x4, [Sprint-Deceleration-Acute-Change]x2, [Run-Deceleration-Straight]x3, Run-Deceleration-Acute-Change, Sprint-Acceleration-Acute-Change, Sprint-Acceleration-Straight, Sprint-Neutral-Acute-Change, Sprint-Acceleration-Acute-Change, Sprint-Deceleration-Large-Change, Run-Deceleration-Straight, Run-Deceleration-Acute-Change, Run-Deceleration-Straight, Run-Neutral-Straight) as uniquely performed by elite rugby league forwards.

LCCspm extracted another total of 11,636 movement patterns performed by elite rugby league backs of which only 7963 movement patterns were uniquely performed by players of this group. An example of such movement patterns is “BoyzzDonnpppnmabbnqu” (denoted as Run-Deceleration-Backwards, Jog-Deceleration-Large-Change, [Run-Deceleration-Acute-Change]x2, Run-Neutral-Acute-Change, Jog-Deceleration-Large-Change, [Jog-Deceleration-Acute-Change]x2, [Jog-Deceleration-Backwards]x3, Jog-Deceleration-Acute-Change, Jog-Deceleration-Straight, Walk-Deceleration-Straight, [Walk-Deceleration-Acute-Change]x2, Jog-Deceleration-Acute-Change, Jog-Neutral-Straight, and Jog-Acceleration-Straight) as uniquely performed by elite rugby league backs.

Furthermore, a total of 18,173 unique movement patterns were identified across all extracted movement patterns across both positional groups.

4.1 Phase I

The set of 18,173 unique movement patterns was used as the independent variables of the data for classification modelling in this phase. The performances of the six classification models developed on this dataset revealed the Naïve Bayes classification model as the least-performing model (Table 2), with an accuracy of 52.33%, 0.58 precision score, 0.51 recall score and F1-score of 0.37. The logistic regression model achieved the highest accuracy of 77.58% and F1-score, recall and precision scores of 0.78, respectively. Meanwhile, the multi-layered perceptron model of 77.23% accuracy and precision, F1-score and recall scores of 0.77, respectively, is not significantly lower than the logistic regression classifier.

Table 3 Effect sizes and average (in %) of performed key movement patterns classifying between RFL forwards and backs
Table 4 Performance of all classification models for forwards and backs using key movement patterns

4.2 Phase II

The correlation-based feature subset filter method identified 36 key movement patterns out of the 18,173 unique movement patterns used as independent variables for classifying elite RFL players into forward or back (Table 3). This subset of 36 key movement patterns (Table 3) had the highest score (0.097) out of all 744,281 subsets.

There are significant differences between players of both positional groups when performing the key movement patterns as assessed by Cohen’s effect size. Considering only correctly predicted players that performed each key movement pattern (i.e. nonzeros observations), three of the 36 key movement patterns had small effect sizes between forwards and backs (Table 3). Twenty and eleven of the 36 key movement patterns had moderate and large effect sizes (Table 3), respectively. Two of the 36 key movement patterns had very large effect sizes. Differences were also seen in how often the key movement patterns were averagely performed per fixture by correctly predicted players of the two positional groups (Table 3).

Having re-trained classification models based on the reduced dataset, the performances of the six classification models applied were slightly lower than the performances of the same classification models when applied to the full dataset but with the exception of the Naïve Bayes classification model (Table 4). The Random Forest model achieved the highest accuracy of 74.75% and F1-score, recall and precision scores of 0.75, respectively. Also, the multi-layered perceptron model of 74.43% accuracy, 0.75 precision, F1-score and recall scores of 0.75, respectively, is not significantly lower than the random forest classification model.

The comparative analysis of the six different classification models’ performances reveals there is no drastic difference in the models’ performance (in terms of accuracy) on the data with 18,173 unique movement patterns or the data with 36 key movement patterns.

Also, a cross-tabulation of the top-36 movement patterns used by the logistic regression of 77.58% accuracy (Table 2) with the key movement patterns used by the random forest of 74.75% accuracy (Table 4) revealed seven overlapped patterns vis-a-vis: “GH”, “HH”, “HHG”, “uuvuuv”, “GHH”, “mo”, “qv” decoded in Table 3.

5 Discussion

Broad positional groups exist in rugby league football [4] and an existing clustering analysis identified distinct playing positions within the same cluster (i.e. positional groups) [3] but did not reveal the movement activities that brought about the similarity because traditional indicators are limited. This study solves this problem via the classification of players into positional groups (i.e. forwards and backs) using data characterised by movement patterns, identifies the key movement patterns necessary for such classification and quantifies the differences in the movement activities performed by players of both positional groups. The LCCspm algorithm was used to extract frequent and closed contiguous movement patterns, and a total of 18,173 unique movement patterns were derived for both elite rugby league forwards and backs. The use of all unique movement patterns as independent variables produced the more accurate classification models among five of the six classification models in comparison with using only the key movement patterns. The highest classification accuracy of 77.58%, recall of 0.77 and F1-score and precision scores of 0.78, respectively, was achieved through a Logistic regression model on the dataset with 18,173 unique movement patterns (Table 2). Also, 36 of 18,173 unique movement patterns are identified as key independent variables (Table 3) and produced its highest classification accuracy of 74.75% with F1-scores, recall and precision scores of 0.75, respectively, through the random forest classification model (Table 4). Eleven and two of the 36 key movement patterns had large and very large Cohen’s effect sizes between elite rugby league forwards and backs (Table 3).

To reveal the players’ movement activities within the same positional group but distinct playing positions, this study utilised movement patterns instead of traditional performance indicators. Rugby league backs comprised more distinct playing positions (\(n=5\)) than forwards (\(n=4\)), and their movement patterns are often characterised by more backwards running (e.g. “LLyzBBmmnnnpnmmmmmvu” and “zABoyzzDonnpppnmabbn”) than forwards. This corroborates with study [23] that reported rugby league backs have more space during match games and have to face opposition when retreating to the defensive line. Elite rugby league backs performed more (unique) movement patterns than forwards, which may be due to the greater number of backs being fielded during matches (\(n=7\)) than forwards (\(n=6\)) as well as backs comprising more distinct playing positions. Rugby league forwards movement patterns are often characterised by more accelerated running and sprinting (e.g. “GGGGGGGGGCCCCCCCCCC” and “SSSTSSTS”). These movement patterns, when decoded into movement activities, can help with training prescriptions wherein elite rugby league players are conditioned based on their exact movements during competitive matches.

Finding the best among distinct models to classify elite rugby league players into forwards and backs reveals the highest classification accuracy of 77.58% via logistic regression using all 18,173 unique movement patterns. Also, this study’s classification analyses involved the use of the relative frequency of performed movement patterns as independent variable values. This study used alternative movement patterns’ values, in the form of relative frequency, in comparison with [17] that used values that only indicated if a player performed a movement pattern within a fixture or not. This demonstrates that the quantification of how frequently movement patterns are performed by players can also be used for classifying rugby league elite players into positional groups at an excellent classification accuracy.

The total number of unique movement patterns used as independent variables for classification modelling is extremely high and not feasible for practical application. This study resolved this high dimensionality data problem by identifying the key movement patterns from the 18,173 extracted movement patterns. Thirty-six of 18,173 movement patterns were identified by the Correlation-based feature subset filter method as the key movement patterns (Table 3). This total number of the identified key movements is about 0.2% of the total number of extracted movement patterns. Upon re-training and re-evaluating the models on data characterised by only the key movement patterns, their performances were not drastically lower than those of the full dataset despite the drastic reduction in the number of independent variables. For example, the decision tree classification model had an accuracy of 65.53% on the full dataset but had an accuracy of 64.73% accuracy on the reduced dataset. On the other hand, the Naïve Bayes model increased its accuracy from 52.33 to 68.41% which highlights the problem of multi-collinearity in the full dataset. Hence, the 36 key movement patterns are encouraged to be used in practice.

Significant differences in forwards and backs of elite rugby league players were observed in the key movement patterns based on Cohen’s effect size and percentage average of performed movement patterns. Two of the 36 key movement patterns had very large effect sizes between forwards and backs. Twenty and eleven of the 36 key movement patterns had moderate and large effect sizes, respectively (Table 3). The effect sizes indicate that there are significant differences in how frequently key movement patterns are performed between correctly predicted elite rugby league forwards and backs. Examples from Table 3 include the movement pattern “HGH” (denoted as Run Acceleration Acute-Change, Run Acceleration Straight, Run Acceleration Acute-Change) had a large Cohen effect size and was averagely performed by backs at a relative frequency of 11.1% per fixture while it was averagely performed by forwards at a relative frequency of 6% per fixture. The movement pattern “uuvuvv” (denoted as Jog Acceleration Straight, Jog Acceleration Straight, Jog Acceleration Acute-Change Jog Acceleration Straight, Jog Acceleration Acute-Change, Jog Acceleration Acute-Change) had a moderate Cohen effect size and was averagely performed by forwards at 11.1% relative frequency per fixture while being performed by backs at 7.7% relative frequency per fixture. The movement pattern “jiiii” (denoted as Walk Acceleration Acute-Change, Walk Acceleration Straight, Walk Acceleration Straight, Walk Acceleration Straight) had a very large Cohen’s effect size and was averagely performed at 5.5% relative frequency by forwards, while it was averagely performed by backs at 10% relative frequency. Another interesting key movement pattern is “SS” (denoted as Sprint Acceleration Straight, Sprint Acceleration Straight), performed by forwards per fixture at 13.7% relative frequency and backs at 10.7% relative frequency with a moderate Cohen effect size. These key movement patterns, especially those involved in accelerated running and sprinting, differ from the study [24] that reported rugby league backs undertake more running than forwards during fixtures.

The overall performance of the classification models for classifying rugby league players into forwards and backs is higher than similar performances of classification models using traditional performance indicators in sports analytics. For example, in [12], an accuracy of 70.1% for classifying elite junior Australian football players into midfield, defence, forwards or rucks positional groups using technical performance indicators was obtained and published. The classification results of this study based on movement patterns (i.e. 77.58% using all unique movement patterns and 74.75% using key movement patterns) provide useful predictive models for new rugby league players in practice. Also, the identified key movement patterns can be used by rugby league coaches to improve the training specificity of players between the groups as well as help with talent identification and recruitment. Additionally, in [3] clusters containing players of distinct players positions were identified based on traditional indicator that does not reveal the match-based locomotive activities that led to such similarity groupings. This study revealed the match-based movement activities that led to such grouping and presented a model for classifying rugby league players into positional groups.

6 Conclusions, limitations and future works

This study fulfilled its aim by extracting frequent closed contiguous movement patterns from elite rugby league players’ sets of movement sequences generated from GPS data to classify players into broad positional groups (i.e. forwards and backs) and identify key movement patterns for practical use. The classification analyses measure the extent to which exact player movement activities can classify elite rugby league players into forwards and backs as well as help identify the key movement patterns necessary for such classification. Logistic regression fitted on the dataset with 18,173 unique movement patterns as independent variables with a relative frequency of performed patterns as values was the most accurate classification model with 77.58% accuracy. Thirty-six key movement patterns were identified which also produced another high-performing classification model of random forest of 74.75% accuracy. Differences in match-based movement activities of elite rugby league forwards and backs as well as the similarities in the movement patterns of distinct playing positions within each group were identified. Based on the findings of this study, it is now established that movement patterns can be used to classify elite rugby league players into forward and back rugby league positional groups, thereby enabling coaches and trainers to develop position-specific training programmes while helping players’ scouts to identify and recruit talent. More so, the random forest classification model with the key movement patterns can be used in practice as a predictive model for predicting the positional group (i.e. forward or back) a rugby player belongs. Although the methods implemented in this study can be replicated across various sports, the conclusions are limited to elite male rugby league players who played in RFL. In the future, the (parameter) optimisation of the classification model for positional groups will be considered. Also, the classification of players into nine rugby league playing positions, the identification of signature movement pattern(s) specific to each playing position and the movement patterns’ contribution to each playing position will be considered.

Table 5 Example of conversion of GPS data into movement sequence

Supplementary information An example of converting GPS data into movement sequence is given in Appendix A.