Introduction

Chronic low back pain (LBP) is among the leading causes contributing to years of living with a disability worldwide1. The increasing prevalence of chronic LBP among adolescents and young adults is widely reported in the literature. A recent epidemiological study reported a prevalence rate of 42.4% per year among young adults2. Mechanical muscle properties such as muscle tone and stiffness are considered fundamental to muscle function and for maintaining energy efficient muscle contractions3. Altered tone and stiffness in the lumbar myofascial region has been identified in people with chronic LBP4,5 and may be related to underlying pathologies and symptoms6. Rehabilitation interventions such as manual therapy or therapeutic exercises are common techniques to manage chronic LBP due to their benefits in altering muscle tone and stiffness by reducing paraspinal muscle activity7,8,9. Paraspinal muscle tone and stiffness are often assessed clinically by palpatory techniques such as manual spinal stiffness assessment10 to guide treatment decisions and appraise treatment effectiveness11. However, the reliability of manual palpatory techniques has been repeatedly criticized12,13. Furthermore, advanced laboratory-based imaging studies such as diagnostic ultrasound14, magnetic resonance elastography15, ultrasonic shear wave elastography16 and electromyography are not always clinically feasible. Therefore, quantifying changes in paraspinal muscle tone and stiffness in a clinical setting continues to be a challenge.

The handheld myotonometer was developed as a mean of objectively quantifying mechanical muscle properties. The principle behind the myotonometer is to apply multiple short impulses over the muscle bulk via the testing probe to generate oscillations in the muscle fibres17. The oscillation waveform is reflective of the viscoelastic properties of the muscles. Published literature indicates that the device is reliable in assessing mechanical properties in a healthy population18,19,20,21 and in people with pathological conditions22,23,24,25,26,27,28,29,30 within a laboratory environment. Recent studies have also demonstrated the feasibility of using the handheld myotonometer to differentiate lumbar extensor fascia stiffness between young patients diagnosed with ankylosing spondylitis and healthy individuals5,31. Despite these positive results, several authors have raised doubts about the reliability of myotonometers when used in pathological groups27 or when used in a clinical environment26. This was given some support by two recently published studies that indicated varying degrees of reliability in different muscle groups and reduced reliability when operating the device in a clinical setting25,26. Other authors have also suggested that the operation of any handheld device may be influenced by the operator’s experience32, measuring technique33 and background noise of the clinical environment27. Therefore, the reliability found in one pathological population recorded in a laboratory setting is unlikely to be generalized to other pathological populations recorded in a clinical setting. In addition, in previous reliability studies, the test site on the skin surface was marked to minimize the confounding factor of site identification when the second measurement was taken. While leaving a mark on the skin surface may be feasible in an inpatient setting, it is not always possible in an outpatient setting due to the irregularity of outpatient appointments.

To date, we found no published data documenting the reliability of a handheld myotonometer in measuring paraspinal muscle tone and stiffness in young adults with chronic LBP in an outpatient setting. The reliability of the device when used in a clinical setting must first be established before it can be considered an outcome measure to monitor changes in paraspinal muscle tone and stiffness. Therefore, the aim of this study was to assess the between-session intra-rater reliability of a handheld myotonometer in young adults with chronic LBP in a musculoskeletal outpatient setting.

Methods

Study Setting

This single-centre study was conducted within the Rehabilitation Outpatient Department of The First Affiliated Hospital, Sun Yat-sen University. Measurements were taken while participants were lying prone on the assessment couch in a treatment cubicle of the musculoskeletal outpatient department. The assessor received three hours of training from a senior research physiotherapist who had extensive experience operating the device. The training included test site identification, the standard operating procedure of the device and supervised practice. The assessors then had one week of unsupervised practice with healthy individuals to familiarize themselves with the data collection protocol.

Recruitment

Participants were recruited from the staff population at the local institute and students who were on clinical rotations at the local institute, using social media and internal announcements. Interested participants were asked to express their interest to a member of the research team. All potential participants were provided with an information sheet and encouraged to ask questions regarding the study. A member of the research team then conducted the screening to confirm eligibility.

Sample population

The inclusion criteria were as follow: (1) age between 18 to 25, (2) persistent pain in the lumbar or lumbosacral region (between T12 and the gluteal fold) without radiation to the legs for at least 6 weeks prior to enrolment34, (3) did not receive intervention for at least four weeks prior to enrolment.

The exclusion criteria were as follow: (1) body mass index (BMI) >30 kg/m2, (2) scoliosis, 3) history of fracture or surgery in the pelvic or spinal areas, (4) history of neurological conditions, (5) pregnancy, (6) presence of other medical conditions other than chronic LBP, (7) presence of a wound in the lumbar spine region at the time of data collection.

Ethics

The study was approved by the Medical Ethical Committee of the First Affiliated Hospital of Sun Yat-sen University [approval no: 2016(85)]. The study was conducted in accordance with the Declaration of Helsinki. An information sheet was provided to all participants. Written informed consent was obtained from all participants. The relevant guidelines and regulations of the local institute were strictly followed when conducting the study. Participants were informed that they could withdraw from the trial without giving a reason. All data set generated as part of the current study are available from the corresponding author upon reasonable request.

Instrument

A handheld myotonometer (MyotonPRO®, Estonia) was used to quantify bilateral lumbar paraspinal muscle tone and stiffness. The testing probe of the myotonometer was placed vertical to the skin surface of the belly of the tested muscle. The probe was first loaded by pushing against the skin surface to the required depth. Once the required depth was reached (indicated by a change of indicator light from red to green), the device then applied three short impulses (one second apart) to induce damped oscillations within the muscle bulk. The oscillation pattern recorded by the transducer was used to calculate the mechanical muscle properties.

Parameters

The parameters of muscle tone and stiffness at bilateral L1 to L5 levels were recorded. The device measures muscle tone as the natural oscillation frequency (Hz) which is calculated as Hz = 1/T, where T is the duration of oscillation measured in seconds. Muscle stiffness (N/m) is related to the maximal acceleration of oscillation and the deformation of the tissue recorded by the transducer17. The manufacturer of the handheld myotonometer indicated that the stiffness of tissues within 2 cm below the epidermis could be measured31. The depth of 2 cm is consistent with other models of soft tissue compliance metres35. The Oswestry Low Back Pain Disability Index (ODI)36 was used to assess the disability level related to back pain. Japanese Orthopedic Association Back Pain score (JOABP)37 was used to assess the multi-dimensional status of the disorder, including quality of life, pain intensity and level of disability. The numerical pain rating scale (NPRS) was used to record the level of pain (range between 0–10) that participants were experiencing at the time of data collection.

Procedure

Demographic data including age, gender, height, weight and clinical information of LBP were recorded at the beginning of the data collection session. Participants were asked to recall their average level of pain over the previous 6 weeks. Parameters were recorded while participants lay prone with the lumbar region exposed. The test sites were identified using the method proposed in a previous study5. The assessor first located the highest level of the iliac crests to estimate the level between the spinous processes of L3 and L4. The spinous processes of L1 to L5 were then identified and marked. The test sites were marked as the extensor muscle bulk prominences that were on the same level as each of the lumbar spinous processes. Participants were asked to place their hands beside their head and to lie comfortably to achieve full relaxation. The study assumed that by lying in a prone position with the trunk relaxed, participant would be in their neutral lumbar lordosis position. Measurements were taken in the order of L1 to L5, starting from the left then progressing to the right. Participants were asked to hold their breath for five seconds at the end of inspiration to minimize the confounding factor resulting from changes of intra-abdomen pressure occurring with natural respiratory cycles. The complete procedure (including test site identification) was repeated by the same assessor on a second occasion, one week apart at a similar time. Data were removed from the device after the first measurement for purposes of blinding and to minimize memory bias effect.

Data analysis

Statistical analyses were conducted using SPSS 20 software (IBM, Armonk, NY, US). The normality of muscle tone and stiffness data were assessed by the Kolmogorov-Smirnov test and frequency histograms. Sample population characteristics including age, gender, body mass index (BMI), NPRS, ODI and JOABP were assessed by descriptive statistics. The differences in tone and stiffness among lumbar levels were assessed by repeated measures ANOVA, followed by post hoc analysis with Bonferroni adjustment (adjusted critical value: p < 0.005). The between-days measurement differences in paraspinal muscle tone and stiffness were assessed by a paired t-test (p < 0.05). Relative intra-rater reliability was determined by the intraclass correlation coefficient (ICC) model 3, k. This study interpreted ICC levels as follows: Excellent >0.75, Good to Fair = 0.74–0.40, and Poor <0.4038. Absolute reliability was determined by the standard error of measurement (SEM)39 and the smallest real difference (SRD)40. Systematic bias between measurements was assessed by Bland-Altman plots and 95% limits of agreement (LOA)41.

Ethical Approval and Consent to participate

The Medical Ethical Committee of the First Affiliated Hospital of Sun Yat-sen University reviewed and approved the present study [Ethics No. 2016(85)]. Informed written consent was obtained from all participants who took part in the present study.

Results

Demographics

Thirty participants with chronic LBP were recruited in the study. The characteristics of the sample population are presented in Table 1. Table 2 presents the clinical information of the sample population.

Table 1 A summary of the demographics of all participants.
Table 2 Clinical information of the chronic LBP cohort.

Muscle tone and stiffness at different lumbar levels

Repeated measures ANOVA indicated that there were significant differences in muscle tone and stiffness among different lumbar levels (p < 0.05). Post-hoc analysis with Bonferroni adjustment indicated the difference in muscle tone and stiffness between each lumbar level was significant, except for the muscle tone between L1-L2 on the right side.

Between-days differences

The mean of the muscle tone and stiffness at each lumbar level recorded on the two occasions are presented in Table 3. Paired t-tests revealed that the between-days differences were not significant (p < 0.05) at all lumbar levels. No significant difference was observed between the left and right side pooled paraspinal muscle tone and stiffness.

Table 3 Results of the ICC analysis of the chronic LBP cohort.

Intraclass correlation coefficient

The ICCs of all parameters at each lumbar level range between 0.81 to 0.96, indicating excellent between-days intra-rater reliability. Detailed results of the ICC analysis are presented in Table 3. The ICC of the pooled muscle tone on the left was 0.93 (CI: 0.91–0.95) and 0.92 (CI: 0.88–0.94) on the right. For pooled muscle stiffness, the ICC was 0.94 (CI: 0.92–0.96) on the left and right side.

SEM and SRD

The SEM for all muscle tone measurements ranged between 0.2–0.7 Hz. The SEM for all muscle stiffness measurements ranged between 7.9–16.5 N/m. The SRD for all muscle tone measurements ranged between 0.4–1.8 Hz. The SRD for all muscle stiffness ranged between 21.9–52.9 N/m. Table 4 illustrates the SEM and SRD of the muscle tone and stiffness recorded at each level.

Table 4 Results of absolute reliability indices of the chronic LBP cohort.

Bland-Altman analysis

The 95% LOA of pooled muscle tone on the left and right side were between −2.0 to 2.1 Hz and −2.4 to 2.0 Hz, respectively. For pooled muscle stiffness, the 95% LOA on the left and right side were between −79.8 to 94.7 N/m and between −89.5 to 79.0 N/m, respectively. Bland-Altman plots (Figs 14) indicated no systematic bias between the two measurements. However, the magnitude of agreement appeared to decrease when paraspinal muscle tone and stiffness increased.

Figure 1
figure 1

Bland and Altman plot of pooled left paraspinal muscle tone.

Figure 2
figure 2

Bland and Altman plot of pooled right paraspinal muscle tone.

Figure 3
figure 3

Bland and Altman plot of pooled left paraspinal muscle stiffness.

Figure 4
figure 4

Bland and Altman plot of pooled right paraspinal muscle stiffness.

Discussion

This study is among the first to assess the reliability of a handheld myotonometer when used in a musculoskeletal outpatient setting to quantify paraspinal muscle tone and stiffness in young adults with chronic LBP. The results indicated acceptable between-day intra-rater reliability. The errors between measurements were small with no systematic bias.

Paraspinal muscle mechanical properties

The present study quantified paraspinal muscle tone and stiffness measurements in young adults with chronic LBP at different spinal levels. The results indicated a decrease in muscle tone and stiffness from L1 to L5. The decreasing values may be related to the anatomy of paraspinal muscles that run more inferiorly towards lower lumbar levels. Thus, the measurements taken at the lower lumbar levels may reflect greater contribution from superficial soft tissue than those at the upper levels. This outcome is supported by a previous study of MRI images that indicated, in the absence of spinal oedema, that the soft tissue above the paraspinal muscles in adults age between 22–81 years old was thicker at L5 than that at L142. If such a finding also applies to the sample population of the current study, it would provide a possible explanation for the decreasing muscle tone and stiffness from L1 to L5. Another possible contributing factor for the decrease in muscle tone and stiffness at the lower lumbar level is the effect of the sequential measurement method from L1 to L5. Sequential measurement refers to the data collection sequence from L1 to L5 beginning on the left side then progressing to the right. It is currently unknown if this type of sequential measurement is likely to affect muscle properties, since the majority of published studies about lumbar paraspinal muscle stiffness only record measurements taken at one particular level.

Intraclass correlation coefficient

The ICC index reflects the degree of consistency and agreement between the two ratings43. The higher the agreement between measurements, the closer the value is to 1. Some authors have suggested that the existing interpretation of ICC is mostly based on data from inter-rater reliability, and a higher ICC value should be expected for intra-rater reliability44, potentially reaching 0.8 or above to be considered acceptable25. In the present study, the ICCs for the paraspinal spinal muscle tone and stiffness at each lumbar level were above 0.8, indicating acceptable intra-rater reliability. The observed ICC values were consistent with a recently published study that investigated the within session intra-rater reliability of paraspinal muscle stiffness at L4 level (ICC = 0.99)45 at rest. The lower ICC score observed in this study may be related to a different data collection process. In the study by Kelly et al., the interval between measurements was not documented and repeated measurements appeared to be taken in close succession, using the test site marked during the first recording. Findings in this study suggest that the handheld myotonometer may be a reliable way to quantify muscle stiffness in a clinical setting. The ICCs of the muscle tone observed in this study are consistent with those reported in a study that investigated the between-day intra-rater reliability of muscle tone of limb muscles in a clinical setting (ICCs range between 0.75–0.82, CI range between 0.37–0.93)25. As with several other published reliability studies concerning handheld myotonometers in clinical25,26 and laboratory settings18,19,46,47, the second measurements were taken at the location that was marked during the first data collection session. Therefore, the majority of existing studies do not take into consideration a potential error related to site identification. The high ICCs observed in this study indicated that the reliability of quantifying paraspinal muscle tone were unlikely to be affected by the site identification process. Despite high ICC values observed in individual lumbar levels and pooled data, the interpretation of ICC was not straightforward since there was minimal clinical data that would suggest whether the observed reliability levels were clinically acceptable. In addition, the lower bound of 95% CI recorded at L1 (tone) and L2 (tone and stiffness) on the right side were below the acceptable level of 0.75 previously proposed48. The wide CIs implied low power due to the small sample size. Thus, no firm conclusion can be drawn from the ICC analysis.

SEM and SRD

The SEM and SRD are absolute indices that reflect the reliability of an instrument. SEM refers to the estimation of how repeated measures tend to be distributed around the “true” score. SRD refers to the smallest value that could be interpreted as “real” change. Any observed change that is above the SRD indicates the change is unlikely due to measurement error. The smaller the SEM and SRD values, the higher the reliability of the instrument. Insufficient data are currently available for direct comparisons of the SEM and SRD values of paraspinal muscles. The SEM observed in this study was less than 0.7 Hz for muscle tone and 20 N/m for muscle stiffness. The small SEM observed in this study was consistent with a study that previously investigated the between-day reliability of peripheral muscle tone in a clinical setting within the stroke population. Lo et al.25 reported the SEM values of the biceps brachii and rectus femoris of 0.76 Hz and 0.83 Hz, respectively. The SEM of triceps muscle tone (0.70 Hz) recorded from a stroke population in a laboratory setting was also consistent with the SEM observed in the present study23. The SEM and SRD of muscle tone and stiffness recorded at bilateral L1 and L2 levels had higher values compared to L3 to L5 levels, indicating higher variations around that “true” score on repeated measurements and requiring larger differences to be considered real change. This was consistent with previous studies in which the myotonometer device had different reliability when measuring different muscle groups. The difference in the reduced reliability observed at the upper lumbar levels may be related to the change in spinal stiffness throughout the respiratory cycle. The crural diaphragm attachment extends to the transverse process of L2 and, therefore, contraction of the diaphragm has a direct effect on spinal stiffness49. A previous study provided evidence to support that L4 stiffness does not change with lung volumes when breathing within a normal tidal range, whereas L2 stiffness increases at all increments in lung volume50. The present study attempted to minimize the effect of respiration by taking the measurements at end tidal inspiration. However, end tidal inspiration volume was not objectively quantified, and it could not be confirmed that participants inspired the same volume on the two occasions. The difference in inspiration volume may have affected the muscle properties at L1 and L2, which in turn would influence the reliability of the reading.

Bland-Altman analysis

The purpose of Bland-Altman analysis is to identify systematic bias and the magnitude of disagreement between measures. The Bland-Altman plots did not indicate systematic bias between the two measurements, though the magnitude of disagreement appeared to increase as tone and stiffness increased. This finding is consistent with a published study that investigated the reliability of measuring the mechanical properties of biceps brachii in the elderly with and without paratonia27. The study similarly reported reduced reliability as muscle tone increased. The range of 95% LOA observed at the bilateral upper lumbar vertebrae was wider than that observed at the lower lumbar vertebrae. The variation in consistency was similar to the findings of the ICC, SEM and SRD indices of the present study. One of the difficulties in interpreting 95% LOA was the lack of a universal clinically accepted range. In a study that previously investigated the difference in paraspinal muscle stiffness in young adults with back pain resulting from ankylosing spondylitis31, a difference of 30 N/m in muscle stiffness was noted at baseline between the back-pain group and healthy controls. This difference was larger than the SRD and was within the 95% LOA observed in the present study. These findings were indicative of handheld myotonometer’s potential to quantify mechanical muscle properties in a clinical setting. There has been no study investigating intervention-induced quantitative changes of muscle tone and stiffness measured by myotonometer in the chronic LBP population. Therefore, there is currently insufficient published data to indicate if the observed range of error in the present study is clinically acceptable. The findings of the present study thus provide a reference for measuring changes of paraspinal muscle tone and stiffness on different days.

Limitations

The lack of other objective measures to ensure the muscles were at a comparable state during the two recording sessions may contribute to the underestimation of reliability. Although participants were advised to refrain from physical exercises on the day of data collection, the amount of physical activities on that day could not be controlled. However, as the study protocol closely mimicked clinical routine practice, it is not always possible to control the physical activities of people who attend outpatient appointments.

Lumbar lordosis was not strictly controlled within the study and the degree of lumbar extension or flexion may affect the reliability reading. However, controlling lumbar lordosis by means such as strapping the participant to the examination plinth may affect the relaxation state and subsequently muscle properties. In addition, controlling lumbar lordosis may lead to the confounding factor of repositioning lumbar lordosis at different measurement time points. Asking the participant to lie prone with the body relaxed is a common clinical practice and frequently cited method in published literature assessing lumbar spinal muscle function.

Because this study did not test the reliability of the device on a range of participants with different levels of muscle tone and stiffness, the findings may not be the generalizable. There is also the limitation of myotonometer technology itself, which measures not only the properties of a particular muscle structure but also those of the soft tissue above the muscle fibre. Thus, the indirect nature of the measuring technique might generate false measurements, since the “true” value of muscle properties may be masked by the stiffer fascia located superficially to the paraspinal muscle. However, a previous study indicated that the stiffness of the erector spinae at rest measured by myotonometer was moderately correlated with muscle stiffness measured by elastography. Changes in erector spinae stiffness measured by a myotonometer at different contraction intensities were also comparable with stiffness measured by elastography45. Another study suggested that the surface electromyography activity is concurrent with the extensor myofascial tone5, though other authors stated it was unlikely that the deeper multifidus would be measured. However, there is no empirical evidence that indicates whether the indentation force might affect structures below the erector spinae. Despite the limitation of the technology, it should not impact the reliability analysis since the readings were compared between the two measurements, rather than between different lumbar levels. Further investigation is recommended to understand the exact spinal tissue that is probed by the myotonometer in order to improve the clinical application of the device.

This study analysed the data based on the left and right side of the spine rather than on the pain location. We acknowledge that this comparison may hide important information relative to the pain side. However, as the study was not primarily aimed to assess the difference in muscle properties between the pain and non-pain side, the study included small and an unequal number of participants with unilateral pain. This comparison between pain and non-pain side is therefore unlikely to be statistically meaningful.

Conclusions

The present study demonstrated acceptable between-days intra-rater reliability when using a myotonometer to measure muscle tone and stiffness in young adults with chronic LBP in an outpatient setting. The agreement between measurements is acceptable. The error range at L3 to L5 levels is consistent with existing literature. The error range recorded at L1 and L2 indicates that a larger change is required to be deemed a real change in muscle tone and stiffness.