Introduction

In the United States in 2010, the rate of spine radiographs within 5 days of presenting to a chiropractor was 204 per 1000 new patients [1]. An analysis of national trends in the United States suggests that the rate of spinal radiography by chiropractors and podiatrists increased by 14.4% between 2003 and 2015 [2]. This increase occurred despite the publication of several evidence-based clinical practice guidelines and clinical prediction rules to assist chiropractors in determining the indication for spine radiographs to assist with diagnosing a pathology [3,4,5,6,7]. Overall, guidelines suggest that radiographs are indicated when signs and symptoms of potentially serious underlying pathology (red flags) are identified through the clinical history and physical examination. However, on its own, an isolated “red flag” may have a high false positive rate for the diagnosis of underlying spinal pathology, such as cancer [8]. For example, the presence of a solitary “red flag” such as age over 50 years may not be sufficient to warrant taking spine radiographs [8, 9]. Therefore, clinicians are encouraged to combine sound clinical judgement and the assessment of red flags when ordering radiographs [9,10,11].

In the absence of “red flags”, the use of spinal radiographs is not recommended [3,4,5,6,7]. Nevertheless, factions of chiropractors, including the International Chiropractic Association promote the use of routine or repeat radiographs to assess the structure and function of the spine [12,13,14]. This practice which dates back to 1910 was initiated when no evidence was available to guide the judicious use of spine radiographs [15]. Historically, these groups of chiropractors have argued that radiographs are helpful to measure postural abnormalities, identify vertebral misalignment or subluxation and guide treatment with spinal manipulative therapy [12, 15, 16]. The belief that radiographs are useful to detect and correct spine structure and function provides the foundation for many chiropractic technique systems that are still in use today. To our knowledge, approximately 23 chiropractic techniques use spine radiography (including full spine radiography) to guide the clinical management of patients [16]. These include the Gonstead, Chiropractic BioPhysics®, Toggle-Recoil, and National Upper Cervical Chiropractic Association (NUCCA) techniques [16]. Proponents of these techniques claim that the use of routine and repeat radiographs is supported by scientific evidence and have published a guideline to assist clinicians with the biomechanical assessment of spinal subluxation in chiropractic clinical practice using radiography [13]. However, these claims have not yet been evaluated for their clinical utility, the benefit a patient gains from a test or treatment [17,18,19]. This was a particular concern for the College of Chiropractors of British Columbia (CCBC) which regulates the practice of chiropractic in the province of British Columbia, Canada. The mission of the CCBC is to protect the public by regulating British Columbia’s doctors of chiropractic to ensure safe, qualified and ethical delivery of care [20].

At the request of the CCBC, we conducted an independent rapid review of the literature to investigate the clinical utility of routine and repeat radiographs (in the absence of red flags) for the structural and functional evaluation of the spine by chiropractors. Specifically, we aimed to investigate: 1) the diagnostic utility of radiographs of the cervical, thoracic or lumbar region for the structural and functional evaluation of the spine; 2) the therapeutic utility of radiographs of the cervical, thoracic or lumbar region for the structural and functional evaluation of the spine; and 3) whether functional or structural findings on repeat radiographs of the cervical, thoracic or lumbar spine are valid markers of clinically meaningful change when monitoring conditions or managing patients. Our three main research objectives required that we first determine the validity and reliability of radiographs for the structural and functional evaluation of the spine.

Methods

We conducted a rapid review of the literature. Rapid reviews are used by health decision-makers (clinicians, patients, managers, and policy makers) who need timely access to health information to plan, develop and implement health care and policies [21, 22]. We used methodology recommended by the World Health Organization to answer our questions and previously used by our group [21, 23].

Protocol and registration

We reported our review according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) and PRISMA Harms checklists [24, 25]. We registered our review with the International Prospective Register of Systematic Reviews (PROSPERO) on November 12, 2019 (CRD42020158321).

Clinical utility

Clinical utility is defined as the benefit that a person has from an intervention or test [17]. Clinical utility includes diagnostic utility (the degree to which the use of a test is associated with changing health outcomes) [18] and therapeutic utility (the degree to which a test contributes to improving health outcomes through the selection of an appropriate treatment) [19]. Demonstrating that a test has clinical utility requires demonstration that patients benefit from a test in a well-designed randomized clinical trial (RCT) or cohort study [17]. However, preliminary steps are necessary before the hypothesis that a clinical test has clinical utility can be tested (Fig. 1). First, the hypothesis that a test (e.g. spine radiographs) may benefit patient care must be generated from sound clinical observations. Second, the validity, diagnostic accuracy and reliability of the clinical test must be investigated [26,27,28]. Studies of diagnostic accuracy should report the sensitivity, specificity, predictive values and likelihood ratios of the test under investigation [26, 27] [Additional file 2]. Tests that are not valid, reliable or lack diagnostic accuracy are unlikely to have clinical utility, and therefore, unlikely to benefit patients [17,18,19]. Our methodology includes an evaluation of the diagnostic accuracy and reliability of radiographs used to evaluate the structure and function of the spine by chiropractors. Finally, tests that are reliable, valid and have diagnostic accuracy must demonstrate clinical (i.e., diagnostic and therapeutic) utility, in other words, impact health outcomes.

Fig. 1
figure 1

Flow of investigations leading to the determination of the clinical utility of a test

Eligibility criteria

Participants and interventions

We included studies of patients presenting to chiropractors who received spinal radiographs of the cervical, thoracic or lumbar region, in the absence of red flags.

Comparators

We considered comparisons with participants who did not receive spinal radiographs or were assessed with other spinal examination methods, such as palpation, postural evaluation or other diagnostic imaging techniques (such as CT scan or MRI).

Outcomes

We investigated structural or functional outcomes associated with various chiropractic approaches that use radiographs as diagnostic or assessment tools. Such approaches may include assessing for asymmetry in vertebral alignment as measured by line drawings, spinal curvatures, and the presence and correction of vertebral dysfunction as determined by measurement or positional listings. We also considered patient important outcomes throughout a course of treatment, including but not limited to pain, functioning, self-reported recovery, health-related quality of life, or well-being.

Study designs

We included RCTs, cohort studies, case-control studies, cross-sectional studies, and diagnostic and reliability studies. We excluded guidelines, letters, editorials, commentaries, unpublished manuscripts, dissertations, government reports, books and book chapters, conference proceedings, meeting abstracts, lectures and addresses, consensus development statements, guideline statements, cadaveric, laboratory or animal studies, qualitative studies, systematic reviews and meta-analyses.

Information sources

We developed our search strategy in consultation with a health sciences librarian, and a second librarian reviewed the strategy to ensure accuracy. We systematically searched three databases that thoroughly index the manual therapy literature published by various health professions from inception to November 25, 2019: MEDLINE (U.S. National Library of Medicine, through Ovid Technologies Inc.), Cumulative Index to Nursing and Allied Health (CINAHL, through EBSCOhost), and Index to Chiropractic Literature (ICL, Chiropractic Library Collaboration). Search terms consisted of subject headings specific to each database (e.g., MeSH in MEDLINE) and free text words relevant to our objectives and study design [see Additional file 1]. We restricted our search to papers published in English.

Study selection

We used a two-phase screening process to identify eligible studies. In phase one screening, we reviewed titles and abstracts and classified articles as possibly relevant or irrelevant. During phase two screening, we reviewed the full text of possibly relevant articles for final determination of eligibility.

A trained investigator (MC) conducted all of the screening. Prior to phase one and phase two screening, we validated the quality of screening by MC. Ten percent of all eligible articles were randomly selected and the titles and abstracts (phase one) and full text (phase two) of these articles were screened independently by a second experienced investigator (CC). A 95% level of agreement was required between two reviewers before moving to full screening. Once the 95% agreement was achieved, one reviewer (MC) completed phase one and two screening.

Risk of Bias in individual studies

The lead author (MC) critically appraised the internal validity of relevant articles using the Scottish Intercollegiate Guidelines Network (SIGN) criteria for RCTs, cohort studies and case-control studies [29, 30], a checklist created by Hoy et al. for cross-sectional studies [31], the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) for diagnostic studies [32] and the Quality Appraisal tool for studies of diagnostic Reliability (QAREL) for reliability studies [33].

We included a quality control step in the critical appraisal of studies. The investigator who assessed the risk of bias of the studies (MC) presented a summary of the critically appraised papers to four experienced methodologists (PC, SM, CC, VK) who validated the outcome of the appraisals. Disagreements regarding the internal validity of papers were resolved through discussion. The lead author created risk of bias tables for all eligible studies (Tables 1 and 2), which were validated by the other investigators (PC, SM, CC, VK). Studies were rated low risk of bias or at risk of bias.

Table 1 Risk of Bias Tables
Table 2 Risk of Bias Tables

Data extraction

The lead author (MC) extracted data from acceptable quality (low risk of bias) studies and built evidence tables stratified by study type (Tables 3 and 4). Data extraction of each study was validated by one of four reviewers (PC, CC, SM, VK) to ensure accuracy. We contacted the study authors when clarification or additional information/data was necessary to build the evidence tables [46]. Evidence tables summarized the pertinent information and were used to create summary statements describing the body of evidence.

Table 3 Evidence Tables
Table 4 Evidence Tables

Data items

Information extracted from each diagnostic study included the study design, sample population, case definition, index test, reference standard and results of the study. Information extracted from each reliability study included study design, sample size, sample description, measurement method and results of the study.

Statistical analyses

When data were available, we computed the measurement mean change (and 95% confidence intervals) from diagnostic studies. Confidence intervals (CI) were calculated using mean change in each group, standard deviation, total number of participants in each group, and α = 0.05.

Evidence synthesis

We used the best evidence synthesis methodology to conduct a qualitative synthesis of the evidence from acceptable quality (low risk of bias) studies [57, 58]. The evidence synthesis provides conclusions based on the best available evidence or may conclude that there is insufficient evidence to make any conclusions [57].

We stratified diagnostic studies into one of four phases, as described by Sackett [28]. Phase one studies test results in patients with the target condition compared to those without the target condition [28]. Phase two studies test whether patients with certain test results are more likely to have the target disorder than patients with differing results [28]. Phase three studies determine whether test results distinguish patients with and without the target disorder among patients in whom it is clinically reasonable to suspect the disease is present [28]. Phase four studies determine whether patients who undergo the test have improved health outcomes compared to similar patients who are not tested [28].

Reporting of outcomes

If we retrieved relevant RCTs, we aimed to check the clinical trials registry (Clinicaltrials.gov) to assess for outcome reporting bias.

Results

Study selection

Our search retrieved 1053 citations (Fig. 2). We removed 94 duplicates and screened 959 articles. Inter-rater agreement for phase one screening was 95.8% between MC and CC. We screened 176 full-text articles (phase two). Inter-rater agreement for phase two screening was 95.4% between MC and CC. Of those, 23 articles met the inclusion criteria and were eligible for critical appraisal. Reasons for exclusion were ineligible publication type (n = 48), population not including patients presenting to chiropractors in the absence of red flags (n = 30), intervention did not include spinal radiographs (n = 12), did not have a comparison group (n = 27), outcomes were not structural or functional findings on radiographs (n = 33) and duplicates (n = 3).

Fig. 2
figure 2

Flow diagram of study selection

Risk of Bias

We found no relevant studies investigating the diagnostic or therapeutic utility of routine or repeat radiographs for structural or functional evaluation of the spine. Similarly, we found no studies investigating the use of repeat radiographs for functional or structural findings of the spine to monitor clinically meaningful changes in conditions or care for patients (Fig. 2).

We critically appraised 23 studies investigating the validity or reliability of radiographs for the functional or structural evaluation of the spine. Of these, 14 were at risk of bias and excluded from the best evidence synthesis [34,35,36,37,38,39,40,41,42,43,44, 47,48,49]. These included 11 diagnostic studies and eight reliability studies (six of the 11 studies had both diagnostic and reliability components). The diagnostic studies with a risk of bias had methodological limitations including 1) inadequate population sampling (n = 5), and 2) inadequate blinding (n = 6). In the reliability studies with a risk of bias, methodological limitations included: 1) poor population and/or rater sampling (n = 5), 2) inadequate inter-rater, intra-rater or information blinding (n = 17), 3) no random sampling (n = 4) and 4) poor test application and interpretation (n = 3) (Tables 1 and 2). We did not identify any cohort, case-control, or cross-sectional studies. Additionally, we did not identify any RCTs, therefore we did not check the clinical trials registry.

We included nine low risk of bias studies in our best evidence synthesis; one diagnostic study [45], seven reliability studies [50,51,52,53,54,55,56] and one study with diagnostic and reliability components [46]. One reliability study provided further analyses to previously collected data, which were reported with the original studies [54]. These studies had some methodological limitations, but not in sampling, blinding or random sampling (Tables 1 and 2).

Study characteristics

We included eight reliability studies, [46, 50,51,52,53, 55, 56] five that examined the intra- and inter-rater reliability of Chiropractic BioPhysics®, of which four that investigated cervical spine measurements [51, 53, 55, 56] and one that studied lumbar spine measurements [52]. One study examined the intra- and inter-rater reliability of flexion-extension radiographs in addition to a standard cervical radiograph series, [46] and one investigated the inter-rater reliability of vertebral rotation and tilt of lateral bending radiographs [50]. We included two phase two diagnostic (validity) studies, [45, 46] that investigated whether patients with radiographic findings were more likely to have the target disorder than patients with other test outcomes [28]. One study investigated radiographic findings of spinal degeneration and cervical complaints [45] and the other investigated findings on flexion-extension radiographs of intersegmental clinical hypermobility [46].

Reliability of radiographic measurements

Four studies investigating Chiropractic BioPhysics® measurements of the cervical spine (i.e., anterior head translation, vertebral translation in the cervical and thoracic spine, cervical lordosis angle, cervicodorsal angle, absolute rotation angle, Ferguson’s angle, Cobb angle and intersegmental measurements) found that these were performed with acceptable levels of reliability (Tables 3 and 4) [51, 53, 55, 56]. One study investigated Chiropractic BioPhysics® measurements of the lumbar spine (i.e., sacral base angle, lumbodorsal angle, lumbosacral angle and lumbar spine vertebral translation) also reported acceptable levels of reliability [52]. The one exception was the measurement of the arcuate angle, which had a low to acceptable level of reliability in the cervical and lumbar spine [52, 53].

For other radiographic measurements, Haas et al. found that categorizing vertebral body rotation and tilting into five categories, may be associated with poor reliability and significant measurement error [50]. Similarly, McGregor et al. reported that measuring intersegmental motion of each vertebra in flexion and extension is associated with poor reliability and significant measurement error [46].

Validity of radiographic measurements

We did not identify any studies of acceptable methodological quality providing evidence of the diagnostic accuracy (sensitivity, specificity, predictive values) of Chiropractic BioPhysics® measurements. Thus, we do not know if these measurements are evaluating clinically important outcomes for conditions of the cervical or lumbar spine.

Two low risk of bias studies provided preliminary evidence, phase two diagnostic studies, of the diagnostic validity of using radiographs for functional and structural evaluation of the spine [45, 46]. McAviney et al. [45] investigated the association of cervical radiograph measurements in patients with and without cervical spine complaints. The authors did not find significant differences in head anterior weight bearing between participants with or without cervical complaints [45]. However, they reported that participants with less than 20° of absolute rotation angle (a measure of cervical lordosis) were greater than two times more likely to have cervical complaints compared to those who had more than 20° [45]. McGregor et al. [46] investigated the benefit of adding cervical flexion-extension radiographs to a normal series of cervical radiographs and standardized case report for the diagnosis of intersegmental clinical hypermobility. They reported no additional diagnostic benefit of using flexion-extension radiographs [46].

Clinical utility

We did not identify any relevant studies investigating the diagnostic or therapeutic utility of cervical, thoracic or lumbar radiographs (in the absence of red flags) for the functional or structural evaluation of the spine. Similarly, we did not identify any relevant studies that investigated whether functional or structural findings on repeat radiographs of the cervical, thoracic or lumbar spine are valid markers of clinically meaningful change when monitoring conditions or managing patients.

Discussion

Clinical utility refers to the degree to which the use of a test (such as radiographs) is associated with changing health outcomes through diagnosis or selection of an appropriate treatment [17,18,19]. We did not find evidence that cervical, thoracic or lumbar radiographs (in the absence of red flags) obtained for the purpose of evaluating the function or structure evaluation of the spine can benefit patients. Therefore, we do not recommend that routine, or repeat radiographs of the cervical, thoracic or lumbar spine (in the absence of red flags) be used by chiropractors to evaluate the structure or function of the spine for diagnostic or therapeutic purposes.

Although we found eight reliability studies and two diagnostic (phase two) validity studies with a low risk of bias, these studies cannot be used to justify using routine or repeat radiographs of the spine [45, 46, 50,51,52,53,54,55,56]. While some measurements of cervical and lumbar spine radiographs have acceptable levels of reliability, and preliminary evidence of diagnostic validity, we did not identify any acceptable studies investigating their clinical utility.

Several evidence-based clinical practice guidelines are available to inform the use of radiographs in cases of trauma, or when pathology is suspected [3,4,5, 7]. Moreover, guidelines make clear recommendations against the use of radiographs to assess function of the spine [5, 7]. While our rapid review agrees with these statements, it nevertheless conflicts with recommendations published by the International Chiropractic Association in the document entitled: “Practicing Chiropractors’ Committee on Radiology Protocols (PCCRP) for Biomechanical Assessment of Spinal Subluxation in Chiropractic Clinical Practice”, a guideline frequently referenced by a subset of chiropractors [13]. The divergent conclusions are attributable to differences in methodology, in particular differences in the search strategy and selection of articles. The development of the PCCRP document did not include a risk of bias assessment of eligible studies. Thus the synthesis included low quality studies which likely biased the recommendations made by that guideline expert panel [59]. Furthermore, it is unclear whether the guideline expert panel had editorial independence; most members (17/25) of the guideline expert panel and investigators were members of the sponsoring organization [13].

In a review by Triano et al. [60], they used a consensus process to assess the appropriateness of imaging as a diagnostic tool to guide the use of manual therapy. Despite the low quality and narrative nature of their review, the use of radiographs to localize the site of care for manual therapy was not recommended. However, contrary to our findings, they recommended the use of static and motion radiographic studies to identify hypermobile but not hypomobile segments. Our study included two relevant low risk of bias studies [46, 50] suggesting poor reliability of radiographs to assess motion patterns, and one preliminary phase two diagnostic study [46] that was not included in their review, that clearly contradicts their recommendation.

We live in the era of value-based health care [61]. One of the goals of value-based healthcare is to reduce the utilization of low-value tests and interventions that do not benefit patients but increase the costs of care. Campaigns such as Choosing Wisely® have been designed and implemented to promote conversations between clinicians and patients by helping patients choose care that is: 1) supported by evidence; 2) not duplicative of other tests or procedures already received; 3) free from harm; and 4) truly necessary [62]. In 2017, the American Chiropractic Association adapted the Choosing Wisely® recommendations on lumbar spine radiography [63] and recommended to avoid routine spinal imaging in the absence of clear clinical indicators for patients with acute low back pain of less than 6 weeks duration. Furthermore, the American Chiropractic Association recommended that repeat imaging must not be used to monitor patients’ progress [62]. Our findings are in agreement with the American Chiropractic Association adapted Choosing Wisely® recommendations.

A principle of value-based health care is that clinical interventions should be free from harm, or at the very least, the benefits of the intervention must substantially outweigh the risks [63]. A known risk for ionizing exposure is the increased frequency of cancer beyond that occurring spontaneously and non-cancer diseases (i.e. cardiovascular diseases) [64,65,66]. Studies have shown that 100 mSv is the approximate dose of radiation to be received by a patient before there is a known increased risk of cancer over a lifetime [64, 67, 68]. The current widely used theory on radiation accumulation is based on the linear no-threshold (LNT) model which in simple terms states: no dose of radiation exists without risk and that risk increases proportionally with dose [68, 69]. Currently, direct risks associated with low doses, as those received with radiographic studies, in the LNT model are unknown [64,65,66].

However, despite the ongoing debate of the LNT theory, [70, 71] the argument remains that radiographic studies should not be considered in isolation, but viewed as part of the patient’s lifetime exposure. Ionizing radiation is a cumulative process that occurs from natural sources, such as sunlight, and decay of elements in our environment, as well as man-made sources, such as medical imaging (i.e. radiographs, computed tomography (CT) and nuclear medicine scans) [62, 63]. It is therefore recommended by the International Commission on Radiological Protection (ICRP) and the Canadian Nuclear Safety Commission (CNSC), that in the absence of information pertaining to low-dose risks, to follow the “as low as reasonably achievable” (ALARA) principle [64]. ALARA is not a dose limit, but a practice that aims to keep the dose levels as far as possible below the regulatory limit [64, 72]. In light of the inherent risks of the use of ionizing radiation, and given that the clinical utility is unknown, the use of routine and repeat radiographs for the purpose of assessing functional or structural evaluation of the spine is not recommended.

Our rapid review has limitations inherent to the rapid review methodology [21]. These limitations include: 1) focused search of the literature (three databases) which may lead to studies being omitted from the review; and 2) the conduct of screening, critical appraisal and data extraction done by one investigator instead of two. However, we reduced the impact of these limitations by: carefully selecting databases where the relevant literature is most likely to be published (MEDLINE, CINAHL, and ICL); and implementing a structured quality assurance methodology to minimize error in screening and selection of articles, and data extraction.

Conclusion

Radiographs are an important diagnostic tool in patient management when clinical indicators of serious pathologies (red flags) are present. We found no evidence that radiographs used to assess the function or structure of the spine improves patients’ outcomes. Therefore, in the absence of red flags, and given the inherent risks of ionizing radiation, we do not recommend the clinical use of radiographs for the routine and repeat evaluation of the structure and function of the spine.