Skip to main content

Psychometric properties of chronic low back pain diagnostic classification systems: a systematic review

Abstract

Objectives

To identify and critically appraise studies evaluating psychometric properties of functionally oriented diagnostic classification systems for Non-Specific Chronic Low Back Pain (NS-CLBP).

Methods

This review employed methodology consistent with PRISMA guidelines. Electronic databases and journals: (PubMed, EMBASE, Cochrane, PEDro, CINAHL, Index to chiropractic literature, ProQuest, Physical Therapy, Journal of Physiotherapy, Canadian Physiotherapy and Physiotherapy Theory and Practice) were searched from inception until January 2020. Included studies evaluated the validity and reliability of NS-CLBP diagnostic classification systems in adults. Risk of bias was assessed using a Critical Appraisal Tool.

Results

Twenty-two studies were eligible: Five investigated inter-rater reliability, and 17 studies analyzed validity of O’Sullivan’s classification system (OCS, n = 15), motor control impairment (MCI) test battery (n = 1), and Pain Behavior Assessment (PBA, n = 1). Evidence from multiple low risk of bias studies demonstrates that OCS has moderate to excellent inter-rater reliability (kappa > 0.4). Also, two low risk of bias studies support of OCS-MCI subcategory. Three tests within the MCI test battery show acceptable inter- and intra-rater reliability for clinical use (the "sitting knee extension," the “one leg stance,” and the “pelvic tilt” tests). Evidence for the reliability and validity of the PBA is limited to one high bias risk study.

Conclusions

Multiple low risk of bias studies demonstrate strong inter-rater reliability for OCS classification specifically OCS-MCI subcategory. Future studies with low risk of bias are needed to evaluate reliability and validity of the MCI test battery and the PBA.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  1. Hartvigsen J, Hancock MJ, Kongsted A et al (2018) What low back pain is and why we need to pay attention. Lancet 391:2356–2367. https://doi.org/10.1016/S0140-6736(18)30480-X

    Article  PubMed  Google Scholar 

  2. Woolf AD, Pfleger B (2003) Burden of major musculoskeletal conditions. Bull World Health Organ 81:646–656

    PubMed  PubMed Central  Google Scholar 

  3. Buchbinder R, van Tulder M, Öberg B et al (2018) Low back pain: a call for action. Lancet 391:2384–2388. https://doi.org/10.1016/S0140-6736(18)30488-4

    Article  PubMed  Google Scholar 

  4. da C Menezes Costa L, Maher CG, Hancock MJ, et al (2012) The prognosis of acute and persistent low-back pain: a meta-analysis. CMAJ 184:E613–E624. https://doi.org/10.1503/cmaj.111271

    Article  Google Scholar 

  5. Burton AK, McClune TD, Clarke RD, Main CJ (2004) Long-term follow-up of patients with low back pain attending for manipulative care: outcomes and predictors. Man Ther 9:30–35. https://doi.org/10.1016/s1356-689x(03)00052-3

    Article  PubMed  Google Scholar 

  6. Wáng YXJ, Wu A-M, Ruiz Santiago F, Nogueira-Barbosa MH (2018) Informed appropriate imaging for low back pain management: a narrative review. J Orthop Transl 15:21–34. https://doi.org/10.1016/j.jot.2018.07.009

    Article  Google Scholar 

  7. Hancock MJ, Maher CG, Latimer J et al (2007) Systematic review of tests to identify the disc, SIJ or facet joint as the source of low back pain. Eur Spine J 16:1539–1550. https://doi.org/10.1007/s00586-007-0391-1

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. Maher C, Underwood M, Buchbinder R (2017) Non-specific low back pain. Lancet 389:736–747. https://doi.org/10.1016/S0140-6736(16)30970-9

    Article  PubMed  Google Scholar 

  9. Balagué F, Mannion AF, Pellisé F, Cedraschi C (2012) Non-specific low back pain. Lancet 379:482–491. https://doi.org/10.1016/S0140-6736(11)60610-7

    Article  PubMed  Google Scholar 

  10. Vining RD, Minkalis AL, Shannon ZK, Twist EJ (2019) Development of an evidence-based practical diagnostic checklist and corresponding clinical exam for low back pain. J Manipulative Physiol Ther 42:665–676. https://doi.org/10.1016/j.jmpt.2019.08.003

    Article  PubMed  Google Scholar 

  11. Patel S, Psychol C, Friede T et al (2012) Systematic review of randomized controlled trials of clinical prediction rules for physical therapy in low back pain. Spine. https://doi.org/10.1097/BRS.0b013e31827b158f

    Article  PubMed  PubMed Central  Google Scholar 

  12. Amundsen PA, Evans DW, Rajendran D et al (2018) Inclusion and exclusion criteria used in non-specific low back pain trials: a review of randomised controlled trials published between 2006 and 2012. BMC Musculoskelet Disord 19:113. https://doi.org/10.1186/s12891-018-2034-6

    Article  PubMed  PubMed Central  Google Scholar 

  13. Foster NE, Hill JC, Hay EM (2011) Subgrouping patients with low back pain in primary care: are we getting any better at it? Man Ther 16:3–8. https://doi.org/10.1016/j.math.2010.05.013

    Article  PubMed  Google Scholar 

  14. Petersen T, Laslett M, Thorsen H et al (2003) Diagnostic classification of non-specific low back pain. A new system integrating patho-anatomic and clinical categories. Physiother Theory Pract 19:213–237. https://doi.org/10.1080/09593980390246760

    Article  Google Scholar 

  15. Vining R, Potocki E, Seidman M, Morgenthal P (2013) An evidence-based diagnostic classification system for low back pain. J Can Chiropr Assoc 57:189–204

    PubMed  PubMed Central  Google Scholar 

  16. Spitzer WO, LeBlanc FE, Dupuis M, Abenhaim L, Belanger AY, Bloch R, Bombardier C, Cruess RL, Drouin G, Duval-Hesler N, Laflamme J, Lamoureux G, Nachemson A, Page JJ, Rossignol M, Salmi LR, Salois-Arsenault S, Suissa SW-DS (1987) Scientific approach to the assessment and management of activity-related spinal disorders. A monograph for clinicians. Report of the Quebec Task Force on Spinal Disorders. Spine 12:S1-59

    Article  Google Scholar 

  17. Alrwaily M, Timko M, Schneider M et al (2016) Treatment-based classification system for low back pain: revision and update. Phys Ther 96:1057–1066. https://doi.org/10.2522/ptj.20150345

    Article  PubMed  Google Scholar 

  18. Cosio D, Lin E (2018) Role of active versus passive complementary and integrative health approaches in pain management. Glob Adv Heal Med 7:216495611876849. https://doi.org/10.1177/2164956118768492

    Article  Google Scholar 

  19. Alhowimel A, AlOtaibi M, Radford K, Coulson N (2018) Psychosocial factors associated with change in pain and disability outcomes in chronic low back pain patients treated by physiotherapist: a systematic review. SAGE Open Med. https://doi.org/10.1177/2050312118757387

    Article  PubMed  PubMed Central  Google Scholar 

  20. Booth A, Clarke M, Dooley G et al (2012) The nuts and bolts of PROSPERO: an international prospective register of systematic reviews. Syst Rev 1:2. https://doi.org/10.1186/2046-4053-1-2

    Article  PubMed  PubMed Central  Google Scholar 

  21. Moher D, Liberati A, Tetzlaff J, Altman DG (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6:e1000097. https://doi.org/10.1371/journal.pmed.1000097

    Article  PubMed  PubMed Central  Google Scholar 

  22. Brink Y, Louw QA (2012) Clinical instruments: reliability and validity critical appraisal. J Eval Clin Pract 18:1126–1132. https://doi.org/10.1111/j.1365-2753.2011.01707.x

    Article  PubMed  Google Scholar 

  23. May S, Littlewood C, Bishop A (2006) Reliability of procedures used in the physical examination of non-specific low back pain: a systematic review. Aust J Physiother 52:91–102. https://doi.org/10.1016/S0004-9514(06)70044-7

    Article  PubMed  Google Scholar 

  24. May S, Chance-Larsen K, Littlewood C et al (2010) Reliability of physical examination tests used in the assessment of patients with shoulder problems: a systematic review. Physiotherapy 96:179–190

    Article  PubMed  Google Scholar 

  25. Barrett E, McCreesh K, Lewis J (2014) Reliability and validity of non-radiographic methods of thoracic kyphosis measurement: a systematic review. Man Ther 19:10–17. https://doi.org/10.1016/j.math.2013.09.003

    Article  PubMed  Google Scholar 

  26. Vibe Fersum K, O’Sullivan PB, Kvale A, Skouen JS (2009) Inter-examiner reliability of a classification system for patients with non-specific low back pain. Man Ther 14:555–561. https://doi.org/10.1016/j.math.2008.08.003

    CAS  Article  PubMed  Google Scholar 

  27. Luomajoki H, Kool J (2007) Reliability of movement control tests in the lumbar spine. BMC Musculoskelet Disord 8:90. https://doi.org/10.1186/1471-2474-8-90

    Article  PubMed  PubMed Central  Google Scholar 

  28. Dankaerts W, O’Sullivan PB, Straker LM et al (2006) The inter-examiner reliability of a classification method for non-specific chronic low back pain patients with motor control impairment. Man Ther 11:28–39. https://doi.org/10.1016/j.math.2005.02.001

    CAS  Article  PubMed  Google Scholar 

  29. Enoch F, Kjaer P, Elkjaer A et al (2011) Inter-examiner reproducibility of tests for lumbar motor control. BMC Musculoskelet Disord 12:114. https://doi.org/10.1186/1471-2474-12-114

    Article  PubMed  PubMed Central  Google Scholar 

  30. O’Sullivan PB, Mitchell T, Bulich P et al (2006) The relationship beween posture and back muscle endurance in industrial workers with flexion-related low back pain. Man Ther 11:264–271. https://doi.org/10.1016/j.math.2005.04.004

    Article  PubMed  Google Scholar 

  31. O’Sullivan K, Verschueren S, Van Hoof W et al (2013) Lumbar repositioning error in sitting: healthy controls versus people with sitting-related non-specific chronic low back pain (flexion pattern). Man Ther 18:526–532. https://doi.org/10.1016/j.math.2013.05.005

    Article  PubMed  Google Scholar 

  32. O’Sullivan PB, Beales DJ, Beetham JA et al (2002) Altered motor control strategies in subjects with sacroiliac joint pain during the active straight-leg-raise test. Spine 27:E1-8. https://doi.org/10.1097/00007632-200201010-00015

    Article  PubMed  Google Scholar 

  33. O’Sullivan PB, Burnett A, Floyd AN et al (2003) Lumbar repositioning deficit in a specific low back pain population. Spine 28:1074–1079. https://doi.org/10.1097/01.BRS.0000061990.56113.6F

    Article  PubMed  Google Scholar 

  34. Hungerford B, Gilleard W, Hodges P (2003) Evidence of altered lumbopelvic muscle recruitment in the presence of sacroiliac joint pain. Spine 28:1593–1600. https://doi.org/10.1097/00007632-200307150-00022

    Article  PubMed  Google Scholar 

  35. Burnett A, Cornelius M, Dankaerts W, O’Sullivan P (2004) Spinal kinematics and trunk muscle activity in cyclists: a comparison between healthy controls and non-specific chronic low back pain subjects—a pilot investigation. Man Ther 9:211–219. https://doi.org/10.1016/j.math.2004.06.002

    Article  PubMed  Google Scholar 

  36. Dankaerts W, O’Sullivan P, Burnett A, Straker L (2006) Differences in sitting postures are associated with nonspecific chronic low back pain disorders when patients are subclassified. Spine 31:698–704. https://doi.org/10.1097/01.brs.0000202532.76925.d2

    Article  PubMed  Google Scholar 

  37. Dankaerts W, O’Sullivan P, Burnett A, Straker L (2006) Altered patterns of superficial trunk muscle activation during sitting in nonspecific chronic low back pain patients: importance of subclassification. Spine 31:2017–2023. https://doi.org/10.1097/01.brs.0000228728.11076.82

    Article  PubMed  Google Scholar 

  38. Dankaerts W, O’Sullivan P, Burnett A et al (2009) Discriminating healthy controls and two clinical subgroups of nonspecific chronic low back pain patients using trunk muscle activation and lumbosacral kinematics of postures and movements: a statistical classification model. Spine 34:1610–1618. https://doi.org/10.1097/BRS.0b013e3181aa6175

    Article  PubMed  Google Scholar 

  39. Beales DJ, Ther MM, O’Sullivan PB, Briffa NK (2009) Motor control patterns during an active straight leg raise in chronic pelvic girdle pain subjects. Spine 34:861–870. https://doi.org/10.1097/BRS.0b013e318198d212

    Article  PubMed  Google Scholar 

  40. Sheeran L, Sparkes V, Caterson B et al (2012) Spinal position sense and trunk muscle activity during sitting and standing in nonspecific chronic low back pain: classification analysis. Spine 37:E486–E495. https://doi.org/10.1097/BRS.0b013e31823b00ce

    Article  PubMed  Google Scholar 

  41. Van Hoof W, Volkaerts K, O’Sullivan K et al (2012) Comparing lower lumbar kinematics in cyclists with low back pain (flexion pattern) versus asymptomatic controls—field study using a wireless posture monitoring system. Man Ther 17:312–317. https://doi.org/10.1016/j.math.2012.02.012

    Article  PubMed  Google Scholar 

  42. Hemming R, Sheeran L, van deursen R, Sparkes V, (2019) Investigating differences in trunk muscle activity in non-specific chronic low back pain subgroups and no-low back pain controls during functional tasks: a case-control study. BMC Musculoskelet Disord 20:459. https://doi.org/10.1186/s12891-019-2843-2

    Article  PubMed  PubMed Central  Google Scholar 

  43. Hemming R, Sheeran L, van Deursen R, Sparkes V (2017) Non-specific chronic low back pain: differences in spinal kinematics in subgroups during functional tasks. Eur Spine J. https://doi.org/10.1007/s00586-017-5217-1

    Article  PubMed  Google Scholar 

  44. Sheeran L, Sparkes V, Whatling G et al (2019) Identifying non-specific low back pain clinical subgroups from sitting and standing repositioning posture tasks using a novel cardiff Dempster–Shafer theory classifier. Clin Biomech. https://doi.org/10.1016/j.clinbiomech.2019.10.004

    Article  Google Scholar 

  45. Biele C, Moller D, von Piekartz H et al (2019) Validity of increasing the number of motor control tests within a test battery for discrimination of low back pain conditions in people attending a physiotherapy clinic: a case–control study. BMJ Open 9:e032340. https://doi.org/10.1136/bmjopen-2019-032340

    Article  PubMed  PubMed Central  Google Scholar 

  46. Meyer K, Klipstein A, Oesch P et al (2016) Development and validation of a pain behavior assessment in patients with chronic low back pain. J Occup Rehabil 26:103–113. https://doi.org/10.1007/s10926-015-9593-2

    Article  PubMed  Google Scholar 

  47. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data data for categorical of observer agreement. Biometrics 33:159–174

    CAS  Article  PubMed  Google Scholar 

  48. Ford J (2003) A systematic review on methodology of classification system research for low back pain. In: Musculoskeletal physiotherapy Australia 13th biennial conference, Sydney, Australia, 2003

  49. Anderson JA (1977) Problems of classification of low-back pain. Rheumatol Rehabil 16:34–36. https://doi.org/10.1093/rheumatology/16.1.34

    CAS  Article  PubMed  Google Scholar 

  50. Deyo RA, Haselkorn J, Hoffman R, Kent DL (1994) Designing studies of diagnostic tests for low back pain or radiculopathy. Spine 19:2057S-2065S. https://doi.org/10.1097/00007632-199409151-00007

    CAS  Article  PubMed  Google Scholar 

  51. Fairbank JCT, Pynsent PB (1992) Syndromes of back pain and their classification. In: The Lumbar spine and back pain. Edinburgh: Churchill Livingstone

  52. Petersen T, Thorsen H, Manniche C, Ekdahl C (1999) Classification of non-specific low back pain: a review of the literature on classifications systems relevant to physiotherapy. Phys Ther Rev 4:265–281. https://doi.org/10.1179/108331999786821690

    Article  Google Scholar 

  53. Ford J, Story I, O’Sullivan P, McMeeken J (2007) Classification systems for low back pain: a review of the methodology for development and validation. Phys Ther Rev 12(33–42):10p

    Google Scholar 

  54. Woolf CJ, Bennett GJ, Doherty M et al (1998) Towards a mechanism-based classification of pain. Pain 77:227–229

    Article  PubMed  Google Scholar 

  55. McCarthy CJ, Arnall FA, Strimpakos N et al (2004) The biopsychosocial classification of non-specific low back pain: a systematic review. Phys Ther Rev 9:17–30. https://doi.org/10.1179/108331904225003955

    Article  Google Scholar 

  56. Fairbank J, Gwilym S, France J, Daffner S (2011) The role of classification of chronic low back pain. Spine 1:36. https://doi.org/10.1097/BRS.0b013e31822ef72c

    Article  Google Scholar 

  57. Salvioli S, Pozzi A, Testa M (2019) Movement control impairment and low back pain: state of the art of diagnostic framing. Medicina (Kaunas). https://doi.org/10.3390/medicina55090548

    Article  Google Scholar 

  58. Carlsson H, Rasmussen-Barr E (2013) Clinical screening tests for assessing movement control in non-specific low-back pain. A systematic review of intra-and inter-observer reliability studies. Man Ther 18:103–110. https://doi.org/10.1016/j.math.2012.08.004

    Article  PubMed  Google Scholar 

  59. Murphy SE, Blake C, Power CK, Fullen BM (2016) Comparison of a stratified group intervention (STarT back) with usual group care in patients with low back pain: a nonrandomized controlled trial. Spine 41:645–652. https://doi.org/10.1097/BRS.0000000000001305

    Article  PubMed  Google Scholar 

  60. Mjøsund HL, Boyle E, Kjaer P et al (2017) Clinically acceptable agreement between the ViMove wireless motion sensor system and the Vicon motion capture system when measuring lumbar region inclination motion in the sagittal and coronal planes. BMC Musculoskelet Disord 18:124. https://doi.org/10.1186/s12891-017-1489-1

    Article  PubMed  PubMed Central  Google Scholar 

  61. Gracovetsky S, Newman N, Pawlowsky M et al (1995) A database for estimating normal spinal motion derived from noninvasive measurements. Spine 20:1036–1046. https://doi.org/10.1097/00007632-199505000-00010

    CAS  Article  PubMed  Google Scholar 

  62. Mannion AF, Knecht K, Balaban G et al (2004) A new skin-surface device for measuring the curvature and global and segmental ranges of motion of the spine: reliability of measurements and comparison with data reviewed from the literature. Eur Spine J 13:122–136. https://doi.org/10.1007/s00586-003-0618-8

    Article  PubMed  Google Scholar 

  63. Öztuna D, Elhan AH, Tüccar E (2006) Investigation of four different normality tests in terms of type 1 error rate and power under different distributions. Turkish J Med Sci 36:171–176

    Google Scholar 

  64. Thode HC (2002) Statistics: textbooks and monographs 164 Testing for normality. CRC Press, New York, NY

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Omar Abdelnaeem.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Search strategies of the searched databases and journals

Database/journal Last citation no Keywords
PubMed 364 (((((Non specific OR non-specific OR nonspecific OR mechanical))) AND ((low back pain OR simple backache OR lumbar strain OR spinal degeneration))) AND ((clinical test OR clinical examination OR clinical sign))) AND ((valid* OR reliabl*)) simple search
EMbase 738 (clinical) AND (test* OR exam* OR sign*) AND (non-specific OR nonspecific OR 'non specific' OR mechanical OR simple) AND (low back pain OR back pain OR LBP) AND (reliab* OR valid*) in English only and limited to human plus searching in EMbase only
Cochrane 92 (Non specific or non-specific or nonspecific or mechanical) and (low back pain or simple backache or lumbar strain or spinal degeneration) and (clinical test or clinical examination or clinical sign) and (valid* or reliabl*) in search manager choose in Trials, Methods Studies, Technology Assessments and Economic Evaluations (Word variations have been searched)
PEDro 226 Non specific low back pain (abstract and title) in advanced search (method clinical trials)
CINHAL 286 (Non specific OR non-specific OR nonspecific OR mechanical) AND (low back pain OR simple backache OR lumbar strain OR spinal degeneration) AND (clinical test OR clinical examination OR clinical sign) AND (valid* OR reliabl*) in advanced search
ProQuest 358 (("non specific" OR "non-specific" OR "nonspecific" OR "mechanical back pain") AND ("back pain" OR "lumbar strain" OR "simple backache") AND ("clinical test" OR "clinical examination" OR "clinical sign") AND ("valid*" OR "reliab*")) AND la.exact("ENG")
Physical therapy journal 460 "non specific" "non-specific" "nonspecific" "mechanical back pain" "back pain" "lumbar strain" "simple backache" "clinical test" "clinical examination" "clinical sign" "valid*" "reliab*"
Chiroindex 79 "non specific" "non-specific" "nonspecific" "mechanical back pain" "back pain" "lumbar strain" "simple backache" "clinical test" "clinical examination" "clinical sign" "valid*" "reliab*"
Australian journal of physiotherapy 54 Non specific in Title/Abs/Keywords OR nonspecific inTitle/Abs/Keywords OR non-specific in Title/Abs/Keywords AND Low Back Pain inTitle/Abs/Keywords OR Mechanical low back pain in Title/Abs/Keywords OR simple backache in Title/Abs/Keywords
Canadian physiotherapy In advanced search 113 Non specific OR non-specific OR nonspecific OR mechanical AND low back pain AND clinical tests OR clinical examination OR clinical sign AND valid* OR reliabl*
physiotherapy theory and practice journal 113 Non specific OR non-specific OR nonspecific OR mechanical AND low back pain AND clinical test OR clinical examination OR clinical sign AND valid* OR reliabl*

Appendix 2

Systematic review critical appraisal tool (Reproduced from Brink and Louw (2011))

Item 1: If human subjects were used, did the authors give a detailed description of the sample of subjects used to perform the (index) test on?

Why the criterion should be evaluated: The validity and reliability of a test will be affected by the sample characteristics or composition, and therefore, the study has to report on the sample characteristics because the validity and reliability scores will then only be applicable to that particular population. A study does not contribute to validity and reliability testing if the subjects were not recruited appropriately
This item can be scored yes if:
1 the sample characteristics (e.g., height, weight, age, diagnosis and symptom status) were described or the manner of recruiting subjects was stated or if selection criteria were applied
If none of the above have been described or if insufficient information was provided, select “no.” If inhuman or inanimate objects were used, select N/A

Item 2: Did the authors clarify the qualification, or competence of the rater(s) who performed the (index) test?

Why the criterion should be evaluated: The amount of experience of the rater(s), performing the (index) test, will influence the validity and reliability scores and needs to be explained
This item can be scored yes if:
1 the rater(s) characteristics (e.g., qualification, specialization and amount of experience using the instrument under investigation) have been described
If the above have not been described or insufficient information was provided, select “no”

Item 3: Was the reference standard explained?

Why the criterion should be evaluated: The index test scores need to be compared to the scores obtained from the reference standard in order to test validity, and therefore, the reference standard needs to be explained appropriately
This item can be scored yes if:
1 the reference standard is likely to produce correct measurements;
2 the reference standard is the best method available; and
3 details (name of the instrument, references to the accuracy of the instrument) of the reference standard are reported
If none of the above is applicable to the reference standard’s description, then select “no”

Item 4: If inter-rater reliability was tested, were raters blinded to the findings of other raters?

Why the criterion should be evaluated: When raters have access to the findings of other raters, it compromises the quality of the reliability testing procedure by inflating the agreement among the raters, and therefore, blinding needs to be performed
This item can be scored yes if:
1 it is stated that the raters were blinded to each other’s findings or if a description that implies that the raters were blinded was reported
If no information is provided, then select “no.” If intra-rater reliability was examined, then select “N/A”

Item 5: If intra-rater reliability was tested, were raters blinded to their own prior findings of the test under evaluation?

Why the criterion should be evaluated: If raters have knowledge of their prior own findings, it will influence the findings of their repeated measurements and could inflate the rater agreement, and therefore, appropriate measures, depending on the characteristics or the study design of the research study, need to be applied to ensure blinding
This item can be scored yes if:
1 rater(s) has/have examined the same subjects on more than one occasion, it should be stated whether the rater(s) was/were blinded to the subjects they have examined previously
If insufficient information is provided, then select “no.” If inter-rater reliability was examined, then select “N/A”

Item 6: Was the order of examination varied?

Why the criterion should be evaluated: If the order is varied, in which the raters examine the subjects when inter-rater reliability is tested, it reduces the risk of systematic bias. If the order is varied in which subjects are examined by one rater when intra-rater reliability is tested, it reduces the risk of the rater recalling the previous test scores and reduces bias
This item can be scored yes if:
1 the order in which subjects were tested varied between raters if inter-rater reliability was tested;
2 the order of subjects was varied when intra-rater reliability was tested
If insufficient information is provided, then select “no.” If varied order of examination is unnecessary or impractical (e.g., rater(s) digitizing or reading X-rays) then select “N/A”

Item 7: If human subjects were used, was the time period between the reference standard and the index test short enough to be reasonably sure that the target condition did not change between the two tests?

Why the criterion should be evaluated: The index test and the reference standard should be performed at the same time; however, this is not always possible. It becomes important to know whether it is possible that the test variable did not change between the two tests, otherwise it will affect the index test’s validity performance
This item can be scored yes if:
1 result from the index test and the reference standard were collected on the same subjects at the same time;
2 a delay between measurements occurs, it is important that the target condition should not change between measurements
If the time period between performing the index test and the reference standard was sufficiently long that the target condition may have changed between the two tests or if insufficient information is provided, then select “no.” If inhuman or inanimate objects were used, then select N/A

Item 8: Was the stability (or theoretical stability) of the variable being measured considered when determining the suitability of the time interval between repeated measures?

Why the criterion should be evaluated: For reliability, the test variable should not change between repeated measures, otherwise it will decrease the amount of agreement obtained between and within the rater(s)
This item can be scored yes if:
1 the stability of the variable is known or reported, and reviewers then decide on an appropriate time interval between repeated measures (stability of a test variable can only be determined if there is a reference standard);
2 there is no reference standard, then the reviewers should agree upon the theoretical stability of the variable and decide on an appropriate time interval between repeated measures
If insufficient information is provided, then select “no”

Item 9: Was the reference standard independent of the index test?

Why the criterion should be evaluated: If the reference standard and the index test are not independently performed, then the index test cannot replace the reference standard on its own
This item can be scored yes if:
1 it is clear from the study that the index test did not form part of the reference standard
If it appears that the index test formed part of the reference standard, then select “no”

Item 10: Was the execution of the (index) test described in enough detail to permit replication of the test?

Why the criterion should be evaluated: Variations in the execution of the reference standard and the (index) test might affect the agreement between the two tests and it is also important to be able to replicate the same study procedure in another setting when needed
This item can be scored yes if:
1 the study reported a clear description of the measurement procedure (e.g., the positioning of the instrument or rater and execution sequence of events);
2 citations of methodology were supplied
The extent to which details is expected to be reported depends on the ability of different procedures to influence the results and on the type of instrument or test under evaluation
If insufficient information is provided, then select “no”

Item 11: Was the execution of the reference standard described in enough detail to permit its replication?

Why the criterion should be evaluated: For the same reason as item 10
This item can be scored yes if:
1 the study reported a clear description of the measurement procedure (e.g., the positioning of the instrument or rater and execution sequence of events);
2 citations were supplied
If insufficient information is provided, then select “no”

Item 12: Were withdrawals from the study explained?

Why the criterion should be evaluated: The sample composition will influence the validity and reliability performance of the (index) test; therefore, it is important to know whether any withdrawals from the sample might have changed the composition of the sample
This item can be scored yes if:
1 it is clear what happened to all subjects who entered the study;
2 subjects who entered but did not complete the study are considered
If it appears that subjects who entered but did not complete the study were not accounted for or if insufficient information is provided, then select “no.” If inhuman or inanimate objects were used, then select N/A

Item 13: Were the statistical methods appropriate for the purpose of the study?

Why the criterion should be evaluated: The aim of validity and reliability studies is to report on an estimate of validity and reliability for the particular test and appropriate statistical methods need to be implemented in order to produce this estimate
This item can be scored yes if:
1 the analysis is appropriate in terms of the type of data (e.g., categorical, continuous and dichotomous);
2 statistical analysis for validity studies incorporates, for example means, differences between measurements, 95% confidence interval and ANOVA; and
3 statistical analysis for reliability studies incorporates, for example, interclass correlation coefficient and 95% confidence interval
If the analysis is not appropriate or if insufficient information was provided, then select “no”

Appendix 3

Classification processes of OCS

figure a

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Abdelnaeem, A.O., Rehan Youssef, A., Mahmoud, N.F. et al. Psychometric properties of chronic low back pain diagnostic classification systems: a systematic review. Eur Spine J 30, 957–989 (2021). https://doi.org/10.1007/s00586-020-06712-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00586-020-06712-0

Keywords

  • Classification
  • Non-specific chronic low back pain
  • Pelvic girdle pain
  • Motor control
  • Psychometric properties