Interpretable machine learning models for classifying low back pain status using functional physiological variables

To evaluate the predictive performance of statistical models which distinguishes different low back pain (LBP) sub-types and healthy controls, using as input predictors the time-varying signals of electromyographic and kinematic variables, collected during low-load lifting. Motion capture with electromyography (EMG) assessment was performed on 49 participants [healthy control (con) = 16, remission LBP (rmLBP) = 16, current LBP (LBP) = 17], whilst performing a low-load lifting task, to extract a total of 40 predictors (kinematic and electromyographic variables). Three statistical models were developed using functional data boosting (FDboost), for binary classification of LBP statuses (model 1: con vs. LBP; model 2: con vs. rmLBP; model 3: rmLBP vs. LBP). After removing collinear predictors (i.e. a correlation of > 0.7 with other predictors) and inclusion of the covariate sex, 31 predictors were included for fitting model 1, 31 predictors for model 2, and 32 predictors for model 3. Seven EMG predictors were selected in model 1 (area under the receiver operator curve [AUC] of 90.4%), nine predictors in model 2 (AUC of 91.2%), and seven predictors in model 3 (AUC of 96.7%). The most influential predictor was the biceps femoris muscle (peak β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}  = 0.047) in model 1, the deltoid muscle (peak β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} =  0.052) in model 2, and the iliocostalis muscle (peak β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} =  0.16) in model 3. The ability to transform time-varying physiological differences into clinical differences could be used in future prospective prognostic research to identify the dominant movement impairments that drive the increased risk. These slides can be retrieved under Electronic Supplementary Material.


Introduction
Low back pain (LBP) has a global prevalence of close to half a billion [1] and ranks as the number one cause of years lived with disability [1]. LBP incurs a significant socioeconomic cost, with £1.6 billion incurred as health-related Electronic supplementary material The online version of this article (https ://doi.org/10.1007/s0058 6-020-06356 -0) contains supplementary material, which is available to authorized users.
* Bernard X. W. Liew bl19622@essex.ac.uk; liew_xwb@hotmail.com Extended author information available on the last page of the article expenditures, and £9.1 billion incurred as loss economic productivity within the UK alone [2]. Being able to predict the clinical course of LBP-i.e. whether it improves, persists, relapses, or worsens-is highly relevant since such knowledge would guide clinical expectations of recovery and would assist clinicians in matching different clinical phenotypes to specific interventions. Increasingly, researchers are turning towards advance statistical learning techniques to develop accurate prediction models for people with LBP using information from high-dimensional, multivariate biological signals [3]. Existing studies have used biological signals such as surface electromyography (sEMG) [4,5], kinematics [6,7], brain neuroimaging [3], and spine neuroimages [8] as candidate predictors; feeding into statistical learning techniques such as support vector machine (SVM) [3][4][5][6], neural networks [7], and natural language processing [8]. The excellent predictive accuracy of current models developed using a spectrum of biological signals and statistical techniques has demonstrated the potential for such methods to assist clinical decision-making [3][4][5][6][7][8].
Despite their predictive value, prohibitive barriers exist towards a more generalized integration of advance predictive models into routine clinical practice. First, some biological signals, such as brain neuroimages [3], are not feasible to be collected as routine in most clinical settings. Second, candidate biological signals should ideally be collected during activities that are routinely assessed clinically, rather than more complex sporting manoeuvres such as golf [4]. Third, the current statistical techniques that are used do not produce solutions that are easily interpretable. For example, the quantitative influence of each biological predictor on the response cannot be easily determined using neural networks and SVM. Fourth, the number of features that can be extracted from biological signals can be large, whilst parsimonious models may be more desirable clinically. A parsimonious model is not only more interpretable, but also it reduces the operational burden of subsequent data collection.
Lifting is an ideal task from which biological signals can be extracted to discriminate LBP status, given that it is commonly implicated as a risk factor for future LBP [9], it is a task that LBP individuals may fear performing [10], and it is a task which commonly provokes pain [11]. Compared with healthy controls, people with LBP typically lift with greater thoraco-lumbar spinal flexion [12]; more antiphase lumbar-hip coordination [13]; greater accessory spinal movements [14]; greater erector spinae, latissimus dorsi, rectus abdominis, and external oblique activity [15,16]; and greater trunk flexor-extensor co-activation [15,17]. Surprisingly, no studies have considered the discriminatory value of these kinematic and EMG variables when used within a functional, time-series context. Functional variables may provide more information on an individual's movement impairments than scalar variables (e.g. means, peaks), given that the former contains both magnitude and temporal information. For example, physical impairments such as impaired flexion-relaxation response of the erector spinae have been found early in the trunk flexion phase [18]. This is in addition to the fact that spinal loads are not constant across the lifting and lowering phases [19].
The primary purpose of the present study is to develop and determine the predictive performance of statistical models to distinguish different LBP sub-types and healthy controls from each other, using whole-body electromyographic and kinematic variables as predictors collected during a functional lifting task. Three models were developed to distinguish individuals with a current episode of LBP (LBP) from asymptomatic individuals (con); individuals in symptom remission (rmLBP) from controls; and individuals with current LBP from those in remission. Herein, we used a state-of-the-art machine learning technique, termed functional data boosting (FDboost). Potential benefits of FDboost in the present context are that it can perform variable selection simultaneously with model fitting.

Design
This single-session study was conducted within the motion capture laboratory of the Centre of Precision Rehabilitation for Spinal Pain, University of Birmingham, UK, between 1 May 2018 and 31 October 2018. The study obtained ethical approval from the Ethics Committee of the University of Birmingham, UK (ERN_17-1717). All participants provided written informed consent prior to participation, and the study adhered to the Declaration of Helsinki.

Participants
Participants between 18 and 55 years old with adequate conversational English language were invited to volunteer. Participants were eligible to be included into one of three groups based on the following criteria: • Current low back pain (LBP): present episode of LBP lasting > 24 h, with a minimum intensity on the Numerical Rating Scale (NRS) score of greater than or equal to 2/10 (0 no pain, 10 being maximal pain) [20]. • Remission low back pain (rmLBP): presently in symptom remission from a LBP episode experienced within the last year, with an NRS score of less than or equal to 1/10 • Controls: no relevant history of LBP that limited their function and/or required treatment from a health professional in the past year.
Participants were excluded if they had previous spinal surgery, spinal fracture, rheumatologic, metabolic, infectious conditions as self-reported, with ability to perform at least 10 full spinal flexion repetitions on screening, pregnancy, and any medical conditions which preclude safe execution of lifting.

Lifting task
Participants performed a repetitive low-load (7% body weight [BW]) [21] lifting task of a basket (30 × 36 × 10 cm) from the ground surface, with the midpoint of the basket's handle positioned 25 cm forward horizontally from the midpoint of the foot on the ground. A horizontal distance of 25 cm was close to a distance previously used [22], which could reasonably be performed by LBP individuals repeatedly as evaluated in pilot tests.
Lifting was performed barefooted, with the position of the foot fixed at 30 cm between the bilateral malleoli [23], with participants freely selecting a toe-out angle. Participants were instructed to "lift in a way that is most comfortable", such that participants could lift in any bimanually symmetrical style [24]. Participants were also instructed to keep their heels on the ground and to maintain a broadly consistent lifting style throughout the task. A lifting repetition (defined below) where the heels lifted off the ground or participants grossly changed their lifting style (e.g. from a stoop fully extended knees to a squat with fully flexed knees) was rejected, and participants were reminded of the task requirements. Participants performed six sets of five consecutive repetitions of lifting, with an inter-set rest period of at least 5 min. Lifting was performed at a self-determined pace, with the natural frequency of lifting determined during practice, and subsequently fixed to that frequency using an auditory metronome.

Assessment
Marker trajectories were captured with eight motion capture cameras sampling at 250 Hz (BTS SMART-DX 6000, BTS Bioengineering Corp, Italy). Retroreflective 14-mm markers were placed on the feet bilaterally (first and fifth metatarsophalangeal head, posterior surface mid-calcaneus), pelvis (bilateral anterior and posterior superior iliac spines), bilateral acromion, and posterior surface of the mid-distal radioulnar joints (wrist).

Data processing
A virtual landmark termed "pelvis" was calculated using the proximal endpoint of the modelled inertial pelvic segment ("pelvis") using the segment inertial and geometric properties of Dempster et al. [29] and Hanavan et al. [30], respectively. A virtual coordinate system was created with three virtual landmarks: the origin at the midpoint of bilateral calcanei marker projected onto the floor, the midpoint of the bilateral first MTP marker projected onto the floor, and a landmark projected 10 cm vertically above the origin. The vertical and anterior-posterior (AP) linear displacements of the following markers/landmarks were calculated relative to the virtual coordinate system, normalized to the participant's height, for use as predictors in the statistical model: right wrist (given task symmetry), right acromion, and pelvis. Marker trajectories were filtered with a low-pass, zero-lag, fourth-order, Butterworth filter (6 Hz).
EMG signals were high-pass-filtered with a fast Fourier transform at 40 Hz to remove the electrocardiogram artefact. Subsequently, the signals were rectified and low-pass-filtered with a zero-lag, fourth-order, Butterworth filter (5 Hz) [31]. The maximal EMG amplitude of each muscle per repetition was extracted and averaged within a set to create a normalizing factor [32]. EMG amplitude of each muscle per lifting set was divided by the normalizing factor. The RA EMG signals were excluded due to movement artefact occurring in individuals who lifted with significant magnitudes of trunk flexion, such that the electrodes were lifted from the body surface.
One lifting repetition was defined from the time when the load left the ground, to a fully upright body posture, and when the load contacted the ground again. Two phases were defined: lifting started when the positive-vertical velocity of the right acromion marker exceeded 10%, and ended when it dropped below 10% of its peak vertical velocity during each set. Lowering began when the negative-vertical velocity of the right acromion marker exceeded 10%, and ended when it dropped below 10% of its peak vertical velocity during each set. Segmentation of the kinematic and EMG signals was undertaken independently for each lifting and lowering phase and was time-normalized to 101 data points.

Statistical model
Simple linear regression and logistic regression were used for between-group comparisons in the baseline demographic and pain characteristics, for continuous and categorical variables, respectively.
Three scalar-on-function (SoFR) logistic regression models were used for binary classification of LBP statuses (model 1: con vs. LBP; model 2: con vs. rmLBP; model 3: rmLBP vs. LBP). A SoFR model is one where the response variable takes on scalar values, and the covariates take on functional (or scalar) values. Functional regression models are extensions of standard regression models such as generalized additive models. Fourteen EMG and six kinematic (vertical and AP displacements of three markers/landmarks) functional variables were available for each lifting and lowering phase, making a total of 40 predictors. For each model, EMG and kinematic variables were independently screened for high collinearity, and variables which exhibited a high absolute correlation of > 0.7 with all other EMG and kinematic covariates [33], respectively, were excluded. The number of functional predictors retained was 30 for model 1 and 2 and 31 for model 3. All functional variables were demeaned as a pre-processing step, so that different predictors had equal potential to be included in the model. For each model, we adjusted for the effects of sex (male vs. female) by including it as a predictor, meaning that the final number of predictors included in model 1 was 31, model 2 was 31, and model 3 was 32.
We used component-wise gradient boosting for model fitting [34]. The algorithm is an iterative procedure which successively adds one covariate to the model, like a sequential forward stepwise regression, with the ability to handle functional covariates, perform variable selection, and allow for penalized estimation. In order to estimate the optimal number of iteration to optimize the negative log-likelihood of the Bernoulli distribution, cross-validation was performed on 25 bootstrap samples of the data, each with a roughly similar ratio of individuals in each group. The area under the receiver operating characteristic curve (AUC) was used to quantify the model's ability to discriminate the two groups. All analyses were performed using R version 3.5.3, using the "FDboost" package [34], and the codes with results are found in the following repository [35].

Results
Forty-nine participants participated in the study (control = 16, rmLBP = 16, LBP = 17), the demographic and clinical characteristics of which are shown in Table 1. The group-averaged kinematic and EMG waveforms are reported in Figs. 1 and 2.
The optimal number of iterations was 201 for model 1, 364 for model 2, and 614 for model 3. Based on the optimal iteration number, seven predictors (two lifting and five lowering) were selected in model 1 (con vs. LBP), which achieved an out-of-bag AUC of 90.4%; nine predictors (three lifting, six lowering) were selected in model 2 (con vs. rmLBP) which achieved an out-of-bag AUC of 91.2%; and seven predictors (one lifting, six lowering) were selected in model 3 (rmLBP vs. LBP) which achieved an out-of-bag AUC of 96.7%. Table 2 provides a qualitative summary of the discriminatory value of the selected To provide another visualization to interpret the models' solutions, the group-averaged waveform values were inputted into the model, and the predicted cumulative increase in class probabilities was calculated when a change occurred across all time points (0-100% cycle) of the lifting or lowering phases (Figs. 6, 7 and 8). As one example, if a 1% unit increase in BF muscle activity were to occur across all time points during lowering, the probability of being in the LBP group increases, which plateaus after 50% of the lowering phase cycle (Fig. 6). In other words, most of the difference between groups for the BF muscle in lowering phase lie in the first half of the task duration.

Discussion
In the present study, functional kinematic and EMG predictors in a simple lifting task were used to predict LBP statuses utilizing FDboost. Our models revealed that 7-9 EMG variables collected during lifting and lowering produced excellent predictive probabilities. More importantly, our models are highly interpretable, in that a parsimonious number of physiological relationships could be understood to increase the certainty of different LBP sub-types.
The physiological relationships between muscle activation and the prediction of LBP were partially supported by existing literature, even though the latter focused on hypothesis testing rather than prediction. Falla et al. [11] reported greater EMG amplitude of the lumbar erector spinae, during sub-maximal lifting and lowering in LBP individuals compared to controls. Haddas et al. [36] reported greater EMG amplitude of the multifidus (L5 level) muscle during the early phase of maximal effort lifting in individuals with LBP compared to controls. In addition, high-density EMG revealed that LBP individuals recruited more caudal, and controls recruited more cranial regions of the lumbar erector spinae during a sub-maximal lifting task [11]. Regional differences in activation of the erector spinae at different phases during the lifting task were not reported [11]. The present study showed that greater lumbar extensor activity increased the certainty of being in the LBP group, which concurred with Falla et al. [11]. However, we also found that more activity in the cranial Longis muscle in the early lifting phase and more activity in the caudal Ileoc muscle in the early lowering phase increased the certainty of being in the LBP group.
The evidence on whether muscular activity differs between healthy controls and individuals in LBP remission is conflicting in part because of the varying motor tasks investigated. When performing 90° walking turns, there were no differences in EMG amplitude of the deep multifidus, lumbar longissimus, and thoracic longissimus muscles between individuals in rmLBP and controls [37]. Yet, individuals in rmLBP demonstrated greater thoracic anterior-posterior direction of the wrist and shoulder marker, and pelvis landmark, during lifting and lowering phases. RACR right acro-mion, RWRST right wrist, con control, rmLBP remission low back pain, LBP low back pain longissimus activity than controls when sitting in a long lordotic posture [38]. Chiou et al. [39] reported no differences in EMG amplitude of the erector spinae muscles (at T12 and L4 levels) during sub-maximal and maximal prone lumbar extensions. A previous study reported greater erector spinae (L1 level) activity but similar activities of the multifidus (L5 level) and EO in individuals in LBP remission during the lifting phase, compared to controls [40]. Given that lifting as a task is commonly implicated as a risk factor for future LBP [9], and that it is a task LBP individuals often have a fear of performing [10], it is suggested that future studies should focus their efforts on collecting predictors from this task.
Pain has been shown to alter the activities of muscles both local and remote to the dominant site of pain, such that their collective response to pain may be seen as a coordinative strategy to ensure consistent task performance despite pain [41][42][43]. We observed that individuals who used more hamstrings and less upper limb muscles to lower the load had a greater certainty of being in the LBP, relative to the rmLBP group ( Table 2). The compensatory muscular strategy between the lower and upper limbs provides evidence of the importance of quantifying whole-body muscular coordination patterns as potential risk factors in LBP research. Interestingly, all studies comparing differences in muscular activity between different LBP sub-types during a lifting task have focused on measuring muscle activity only of the abdominal, trunk extensor, and hip extensor muscles [11,36,40,44]. The present findings uniquely demonstrate that activities of muscles as distal as the ankle dorsiflexors, and as proximal as the deltoids, can contribute to the accuracy of predictive models in LBP research.
Previous studies have reported altered kinematics between individuals with and without LBP at the ankle, knee, and hip joints [45] and even altered inter-segmental coordination between the lumbar-hip and between the hip-knee joints during lifting [13]. However, no differences in trunk, hip, knee, ankle linear and angular displacements, velocities, and accelerations were reported between people with and without LBP during the lifting and lowering phase of a 12 kg box from the floor [44]. Given that Lariviere If β coefficient is positive or negative over the task cycle, only the directional effect with the largest magnitude for each predictor is reported. If β coefficient has dual signs (positive and negative) over the task cycle, directional effects with the largest magnitude for each sign are reported et al. [44] reported greater thoracic erector spinae activity during the lifting and lowering phases in people with LBP compared to controls, taken with the present findings, muscular activation strategies may be more useful biomarkers of an individual's spinal health than kinematic predictors. It remains to be investigated whether EMG predictors of LBP status would be selected once inclusion of a thorough set of functional kinematic predictors is considered. Despite the optimism of the models present, the current study has some limitations. First, this study had a relatively small sample size compared to the number of predictors. The number of participants in the present study was, however, comparable to other similar researches in clinical biomechanics (n = 41 in [46], n = 44 in [47]). Results from the study will enable future researchers to fit the presently reported model's learning curve to inverse power law models [48] and to estimate the sample size needed to achieve a desired classification performance. Second, the variables used for model building were collected in a single session, which although reflected a typical clinical assessment scenario, may not reflect normal movement behaviour in daily living. With more advance wearable sensor technology emerging which allows remote biomechanics analysis [49], the methods employed presently can be exploited to yield statistical models which have greater ecological validity, and ultimately better personalized predictive accuracy. Ileoc iliocostalis lumborum, Longis longissimus thoracis pars thoracis, TA tibialis anterior, VL vastus lateralis, con control, lbp low back pain Third, we did not include age as a predictor during model building even though individuals with LBP and rmLBP were significantly older than controls. Even though previous studies have reported age-related differences in movement and control strategies [50,51], these differences were examined between adult cohorts with ~ 40 years in age gap. Lastly, given the cross-sectional nature of this study, we view the current work through the lens of a "hypothesis generation" framework, where we explored the predictive value of functional kinematic and EMG variables in a LBP setting. Future work would be to build predictive models using functional movement variables and subjective reports of pain, psychological function and physical function, and patient characteristics in a prospective cohort setting.
The ability to transform time-varying physiological differences into differences in clinical outcomes could be used in future research to predict those likely to develop LBP. Whilst this is not the focus of the current work, it certainly provides the foundation to examine this in a longitudinal study. Although it is interesting to use such models to predict who has or doesn't have LBP, clinically it is certainly more relevant to be able to predict who is likely to develop LBP, persistence, and its relapse and identify the dominant movement impairments that drive the increased risk that would  have direct implications for management and preventative strategies. FDboost can also be extended to performing regression analysis, where the outcome is a continuous variable (e.g. fear-avoidance levels). Given that explicit reports may not correspond to actual implicit measurements of an individual's psychophysical status [10], currently, it is not feasible to perform implicit measurements of an individual's psychophysical status in a clinical environment due to the lengthy time involved. In such a scenario, a statistical model can be trained in a laboratory environment that predicts implicit psychophysical levels using objectively collected movement variables and measure the pertinent movement variables clinically to predict their likely implicit psychophysical levels. Fig. 6 Predicted cumulative probability of being in the LBP group given an input of each group's (con and LBP) average waveform for each selected predictor in model 1. a Lifting and b lowering predictors. Probabilities reflect the additive increase in certainty given the observed difference between groups in EMG amplitude for every 1% of the movement cycle. BicepsB biceps brachii, BicepsF biceps femoris, EO external oblique, Ileoc iliocostalis lumborum, Longis longissimus thoracis pars thoracis, TA tibialis anterior, VL vastus lateralis, con control, LBP low back pain

Conclusions
Our approach of using functional kinematic and EMG variables collected in a simple, yet clinically relevant task such as lifting, in conjunction with FDboost, produced clinically interpretable models that retain good to excellent predictive capability. If the approach used presently can be extended to include predictors collected using wearable sensors, then our models could have great promise in delivering the breakthrough in predictive performance that can be feasibly implemented in a busy clinical environment. Fig. 7 Predicted cumulative probability of being in the rmLBP group given an input of each group's (con and rmLBP) average waveform for each selected predictor in model 2. a Lifting and b lowering predictors. Probabilities reflect the additive increase in certainty given the observed difference between groups in EMG amplitude for every 1% of the movement cycle. BicepsF biceps femoris, Delt deltoids, GL gastrocnemius lateralis, GMax gluteus maximus, Longis longissimus thoracis pars thoracis, SOL soleus, VL vastus lateralis, con control, rmLBP remission low back pain 1 3 Fig. 8 Predicted cumulative probability of being in the LBP group given an input of each group's (rmLBP and LBP) average waveform for each selected predictor in model 3. a Lifting and b lowering predictors. Probabilities reflect the additive increase in certainty given the observed difference between groups in EMG amplitude for every 1% of the movement cycle. BicepsB biceps brachii, BicepsF biceps femoris, Delt deltoids, Ileoc iliocostalis lumborum, Longis longissimus thoracis pars thoracis, LatsD latissimus dorsi, ST semitendinosus, rmLBP remission low back pain, LBP low back pain Funding This study was supported by funding from the EUROSPINE Task Force Research pilot study Grant (ID 2018_10).

Compliance with ethical standards
Conflict of interest All authors declare that they have no conflicts of interest.
Ethical approval Ethical approval was obtained from the Human Research Ethics Committee of the University of Birmingham, UK (ERN_17-1717).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.