Cluster analysis successfully identifies clinically meaningful knee valgus moment patterns: frequency of early peaks reflects sex-specific ACL injury incidence
Biomechanical studies of ACL injury risk factors frequently analyze only a fraction of the relevant data, and typically not in accordance with the injury mechanism. Extracting a peak value within a time series of relevance to ACL injuries is challenging due to differences in the relative timing and size of the peak value of interest.
The aim was to cluster analyze the knee valgus moment time series curve shape in the early stance phase. We hypothesized that 1a) There would be few discrete curve shapes, 1b) there would be a shape reflecting an early peak of the knee valgus moment, 2a) youth athletes of both sexes would show similar frequencies of early peaks, 2b) adolescent girls would have greater early peak frequencies.
N = 213 (39% boys) youth soccer and team handball athletes (phase 1) and N = 35 (45% boys) with 5 year follow-up data (phase 2) were recorded performing a change of direction task with 3D motion analysis and a force plate. The time series of the first 30% of stance phase were cluster analyzed based on Euclidean distances in two steps; shape-based main clusters with a transformed time series, and magnitude based sub-clusters with body weight normalized time series. Group differences (sex, phase) in curve shape frequencies, and shape-magnitude frequencies were tested with chi-squared tests.
Six discrete shape-clusters and 14 magnitude based sub-clusters were formed. Phase 1 boys had greater frequency of early peaks than phase 1 girls (38% vs 25% respectively, P < 0.001 for full test). Phase 2 girls had greater frequency of early peaks than phase 2 boys (42% vs 21% respectively, P < 0.001 for full test).
Cluster analysis can reveal different patterns of curve shapes in biomechanical data, which likely reflect different movement strategies. The early peak shape is relatable to the ACL injury mechanism as the timing of its peak moment is consistent with the timing of injury. Greater frequency of early peaks demonstrated by Phase 2 girls is consistent with their higher risk of ACL injury in sports.
KeywordsACL Biomechanics Cluster analysis Data mining Injury risk
Anterior cruciate ligament
Anterior cruciate ligament (ACL) injuries result in considerable societal burden (Kiadaliri et al., 2016), explaining extensive and ongoing research efforts to prevent them. Cadaver studies have demonstrated that the ACL can be loaded through a knee valgus moment (VM) (Markolf et al., 1990), and that the VM is an important contributor to the multi-planar loads that produce clinically meaningful injury patterns (Bates et al., 2018). A landmark study by Hewett et al. (Hewett et al., 2005) revealed that the knee valgus moment was a risk factor for ACL injury, but had important limitations. The total number of injured players was low (N = 9), leading to a high chance of false discoveries (Christley, 2010; Colquhoun, 2014). Furthermore, the study used a bilateral drop-jump, a movement which typically does not result in athletic ACL injuries (Montgomery et al., 2018; Walden et al., 2015).
Recent studies using similar methodology (Krosshaug et al., 2016; Leppanen et al., 2017) have not replicated the results of the Hewett study (Hewett et al., 2005) and the observation has been made that biomechanical risk factor studies seldom account for the ACL injury mechanisms in their analyses (Dai et al., 2014) which may explain their inconsistent results. While ACL injuries occur shortly after contact with the ground (Koga et al., 2010; Krosshaug et al., 2007), prospective studies have extracted peak values over the complete weight acceptance phase (Hewett et al., 2005; Krosshaug et al., 2016; Leppanen et al., 2017). The timing of global peaks occur during mid- to late weight acceptance phase, which is inconsistent with that of ACL injury (Sigurethsson et al., 2018). A key difficulty in extracting the peak value of the knee VM during the critical early contact phase is the variability in the waveform of the calculated VM signal, which doesn’t always have a discrete peak in the early phase (Sigurethsson et al., 2018).
Augmenting traditional biomechanical approaches with machine learning tools, such as cluster analysis (Halilaj et al., 2018) has been suggested as a means for opening new avenues of research. Identifying a waveform consistent with the mechanism of ACL injury is a classification problem that may be solved with cluster analysis. To date, no method has been published that clusters joint moment waveforms into different shapes.
The primary aim of this study was to test the feasibility of using cluster analysis to identify different shapes of VM waveforms in the early weight acceptance phase of a change of direction task, a movement during which ACL injuries occur (Walden et al., 2015). Our hypotheses were; 1a) the waveforms may be classified into a small number of categories, 1b) at least one of the resulting clusters will have an early peak consistent with the timing of ACL injury (Krosshaug et al., 2007).
A secondary aim was to compare the frequency of the early peak waveform between the sexes before and after puberty. Our hypotheses were that; 2a) before adolescence, athletes will show an identical frequency of early peaks, 2b) after adolescence girls will have greater frequency of early peaks, consistent with the 2-3x greater risk of sports related ACL injuries reported in the literature (Montalvo et al., 2018; Nicholls et al., 2018).
Design and setting
Prospective cohort laboratory study.
No. of Trials
No. of Trials
Data collection methods have been previously described by Briem et al. (Briem et al., 2017). In short, height and weight were measured before a short warm-up on a stationary bike. Strength testing of hip muscles in abduction and external rotation was performed.
After strength testing, 46 reflective markers were placed on the subject, 4 on each foot, one per malleolus, a 4 marker cluster on each shank, one per femoral condyle, a 4 marker cluster on each thigh, a 3 marker cluster on the sacrum, one on each greater trochanter of the femur and on the highest point of each iliac crest, on bilateral anterior superior iliac spines, on the thorax (approximately t10-t12), on the c7, on the sternum, and on the lateral aspects of each scapular acromion.
A static trial was recorded, and anatomical markers were removed (trochanteric, malleolar, condylar, and iliac crests) before the dynamic movement trials. Subjects performed 5 repetitions of a change of direction task on each leg, and 5 repetitions of a bilateral drop-jump from a 23 cm (youth) or 30 cm (adolescents) box. Movement tasks were repeated after a 5 min skateboard exercise protocol and all conditions were pooled for this analysis. The order of movement trials was randomized with an online randomizer in phase 2 (Random.org, 2016), and a coinflip in phase 1.
Data processing and statistical analysis
An 8 segment, 48 degree of freedom, musculoskeletal model was constructed in Visual3D (C-Motion) consisting of feet, shanks, and thighs of both lower extremities, in addition to a pelvis and a trunk. Ankle joint centers were defined as midway between malleolar markers, knee joint centers as midway between femoral condyle markers, hip joint centers as 25% of the distance between trochanteric markers, and the pelvis-trunk joint as midway between the iliac crest markers. Visual3D default settings were used for all segment inertial parameters.
Calculations of kinematics were performed using the 6 degree-of-freedom method and inverse kinetics were calculated for joint moments. Joint moments were normalized by subject body weight, since the tensile strength of the ACL ligament also scales with body weight (Chandrashekar et al., 2006). Time series data of the stance phase of a change of direction task was exported from Visual3D (C-Motion) and imported into R (Team, 2018) for analysis. Video analysis of ACL injuries have revealed that ACL injuries occur in the initial 50 ms after contact with the ground (Krosshaug et al., 2007). However, these descriptions of ACL injuries most often involve high level athletes (Koga et al., 2018) due to the availability of match video recordings. With that in mind, we observed that the fastest athletes in our cohort who displayed an early peak knee VM did so close to the 50 ms mark, which was generally within the first 25% of the stance phase. In order to ensure that slopes on either side of the peak waveform would be captured, data from the first 30% of stance were selected for the cluster analysis.
A transformation was then performed by calculating the lagged differences of the series and taking its sign. Thus, if a VM data point was higher than that found in the prior frame it was given the value 1, whereas if the data point was lower than that in the prior frame, a value of − 1 was given. Each time series was therefore reduced to its directional changes (increasing or decreasing), representing its waveform. The Euclidean distances between the transformed waveforms were calculated (Montero & Vilar, 2014) and clusters formed using the Ward.D2 (Charrad et al., 2014; Murtagh & Legendre, 2014) method which produces compact spherical clusters.
To decide on a number of clusters to produce, the C-Index (Hubert & Levin, 1976) was calculated for total cluster numbers from 2 to 50 clusters. As there was no distinct elbow in the C-Index plot, a number of clusters was selected based on a C-Index cut-off value of 0.05. The resulting clusters were visually examined and assigned to groups based on similarities in their appearance. Individual curves within a cluster were examined when the aggregated cluster appearance was unclear.
In order to differentiate between different magnitudes of similar shapes of knee VM data, a second cluster analysis step was performed. All curves within each shape were interpolated and divided by bodyweight in kg. The Euclidean distances between them were calculated and using the Ward.D2 method (Murtagh & Legendre, 2014), 2–4 sub-clusters based on force magnitude were formed. The lowest C-Index value out of the result was selected. Each of the resulting sub-clusters were then examined and classified as either a small, medium, or a large magnitude.
No specific cut-offs have been commonly accepted to determine the quality of clusters formed with cluster analysis. Instead, the cluster analysis process was visually inspected to confirm that the intended goal of discrete shapes in the VM waveform was reached. For the secondary aims of determining sex- and age-dependent differences in the frequency of the early peak VM shape, a chi-square test was performed on the frequency distribution of the clusters by sex and maturity where each individual trial was the unit of study. Significance level was set at 0.05.
After screening for errors in performing the side-step maneuver as well as removing trials with large artifacts, 4903 attempts out of the 5080 collected were available for analysis.
The observed and expected frequencies of the six shape-based clusters representing the knee valgus moment
Observed and expected frequencies for shape based clusters and magnitude based sub-clusters
Observed (Expected) Frequencies
The main results of this study are in line with hypothesis 1a, i.e. that the two-step clustering process reported can differentiate between six different curve shapes of the knee VM during the early stance phase, and 2–3 different magnitudes within each shape. Moreover, one of the shapes identified was the early peak, consistent with hypothesis 1b. In phase 1 boys had a greater relative frequency of early peaks, in contrast to hypothesis 2a. However, consistent with hypothesis 2b, girls in phase 2 did have a greater relative frequency of early peaks with a ratio of 2:1, consistent with the reported 2-3x higher incidence of ACL injuries for adult females (Montalvo et al., 2018; Walden et al., 2011).
The Van Mechelen model of injury prevention is an established framework to guide preventative research (van Mechelen et al., 1992). The model emphasizes the need to first establish the aetiology and mechanisms of injury before implementing interventions. Extensive research has been conducted on the mechanism of injury, including cadaver models of ACL injuries (Bates et al., 2018), but very little work has been done to discover how components of injury mechanisms are manifested in non-injury movements. This dearth of cross-sectional research has resulted in prospective studies that are largely exploratory (Hewett, 2019). The relatively low incidence of ACL injuries (Montalvo et al., 2018) means that the ACL-injured cohort in prospective risk factor studies is likely to be small, with a resulting elevated risk of both false positive and false negative results (Christley, 2010; Colquhoun, 2014). Rather than exploration, risk factors tested in prospective studies should be grounded by theory to produce more robust findings.
That is the context in which this study is placed. We present cluster analysis of knee VM waveforms as a novel method to identify the existence of an important component of the ACL injury mechanism, the early peak VM (Bates et al., 2018). Although ACL injuries are more common for female athletes than for male athletes (Montalvo et al., 2018), the injury mechanism is likely the same (Owusu-Akyaw et al., 2018). Therefore, a factor that is related to the injury mechanism in addition to being more common for female athletes has potential to be a true risk factor. Future research should examine a connection of the early peak VM waveform to kinematics observed during ACL injury, and conduct prospective cohort studies to establish a connection to injury risk.
This is to our knowledge the first study to have used cluster analysis techniques on 3D motion capture data. During motion capture, a number of different time series are calculated resulting in thousands of data points per measurement. Traditionally, these thousands have been reduced to a small number of single values such as local or global peaks (Leppanen et al., 2017; Torry et al., 2011) which can be input into a statistical model. Our results show that reducing different curve shapes of the knee VM to a single data point in such a manner results in many of the extracted data points being essentially unrelated to the timing of ACL injury. This likely weakens data analysis of such studies in terms of statistical power and the clinical relevance of the results.
Assessment of homogeneity of shape within each cluster was performed via visual inspection. This requires a certain level of clinical judgement which may not be reliable between different assessors. The reader is encouraged to examine the results of the clustering process using our data and analysis script from the online depository (see data availability statement).
The C-Index of our cluster analysis is reported, but currently there is no consensus on what constitutes a good C-Index or how the number of clusters should be decided. We used the first elbow of the C-Index graph, or a 0.05 cut-off of a smooth graph. The potential number of clusters in our data is 1025 unique shapes, some of which likely differ only in the exact location of local maxima or minima. It’s possible, but in our opinion unlikely, that using 6 initial clusters would yield the same 6 shapes presented as the basic shapes in the present study, or that using 100 initial clusters would yield superior results.
We have reported results from only 35 athletes in the adolescent cohort. The choice to use a subset of the available data was made due to the exploratory approach undertaken. A sample size of 35 is common in biomechanical studies, and since 20 trials are collected from each athlete the study is adequately powered for the chi-squared test. Future studies with larger cohorts are required to confirm the frequency of the early peak VM waveform.
Conclusions and clinical relevance
This is to our knowledge the first study that demonstrates that clustering techniques are feasible to extract meaningful information from biomechanical data with relevance to a specific injury mechanism. A small number of distinct shapes of early stance phase knee VM curves exist and can be identified with a cluster analysis procedure. The early peak knee VM shape is consistent with the ACL injury mechanism, since the injury occurs early and the knee VM can strain the ACL. The sex-specific frequencies of the early peak shape in adolescence is consistent with the sex-specific difference in ACL injury incidence and may be a predisposing factor to injury. These findings should inform prospective risk factor studies as well as studies on kinematics related to the early peak waveform.
The authors would like to acknowledge Prof. Ólafur Pétur Pálsson and Ragnar Leví Guðmundarson for valuable input regarding the data analysis.
HBS collected phase 2 data, designed and performed the data analysis and wrote the manuscript. KB designed the study, participated in planning of the data analysis, and provided substantial revision of the manuscript. Both authors read and approved the final manuscript.
Funding for the study was provided by The Icelandic Centre for Research (Rannís – Rannis.is), funding codes 120410021, 903271305, 1203250031, and 185359051. The Football Association of Iceland has provided travel funds to the lead author to present findings at a conference. No funding source was involved in the study design, the execution of the study, the data analysis, writing the report, the decision to publish the results, or writing the article.
Ethics approval and consent to participate
Ethics approval granted by the Icelandic National Bioethics committee, approval code VSNb2012020011/03.07. All participants, together with a legal guardian, received information regarding the study protocol, including known risks and benefits, and signed an informed consent statement.
Consent for publication
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.