Low back pain (LBP) is the most prevalent musculoskeletal condition in the general population [1, 2]. The point prevalence of LBP ranges between 1 to 58.1% and one-year prevalence ranges between 0.8 to 82.5% [3] depending of the LBP definition and population. LBP is the leading cause of years lived with disability and is the sixth leading cause of disability adjusted life years globally [4, 5] and it is associated with poor health-related quality of life and has a substantial economic burden to society [6, 7]. Non-specific LBP is more common than specific LBP (e.g., cancer, fractures, infectious disorders, or ankylosing spondylitis) and it cannot be attributed to a specific underlying pathology [8].

The clinical assessment of low back pain involves completing a physical examination [9]. Manual palpation is a common tool used to assess patients with LBP [10]. It includes static and dynamic palpation of soft tissue or joints and aims to identify painful structures and biomechanical dysfunction of the spine [11]. However, the clinical utility of these tests is controversial.

Previous systematic reviews have investigated the reliability and validity of manual palpation for the assessment of patients with LBP [9, 11,12,13]. According to these reviews, the inter-rater reliability of static joint and soft-tissue palpation to locate pain is poor (kappa (k) ≤ 0.40), and the inter-rater reliability of static palpation for soft tissue changes (e.g., tension) is inconsistent [9, 11, 13]. Furthermore, one review reported that motion palpation may be valid in detecting decreased motion, or lack of end-play in the lumbar spine [12]. However, motion palpation may not be valid to detect aberrant motion of the sacroiliac joints [12]. These reviews are outdated and there is a need for an up-to-date systematic review. The purpose of our systematic review was to determine the reliability and validity of manual palpation used to assess adult patients with LBP.


Eligibility criteria


We included studies of adults (≥18 years) with LBP. LBP refers to pain or discomfort below the costal margin and above the inferior gluteal folds and can be with or without referred leg pain [14]. Our systematic review includes patients with non-radicular low back pain, radicular low back pain, spinal stenosis, degenerative or isthmic spondylolisthesis, and failed back surgery syndrome.


Our review focuses on studies assessing the reliability or validity of manual palpation for the assessment of patients with LBP. Reliability describes the consistency of measurements across people or instruments [15]. Validity is the degree to which a test measures what it is intended to measure [15].

Manual palpation is a diagnostic procedure where the examiner feels with their hands to assess the mobility and state of the soft and boney tissues [16]. Palpation techniques include both static and dynamic (motion) methods, which are often used to identify areas of tissue pain and dysfunction, target manual and manipulative therapies and determine effectiveness of the intervention [9]. Static palpation is used to identify bony asymmetry of bony landmarks, tender points, and trigger points to evaluate tissue texture, temperature and tone [17]. Motion palpation is used to assess the quantity and quality of movement through the lumbar spine and pelvis [17]. Motion palpation assessment can be continuous within the normal range of motion with joint play, or dynamic soft tissue palpation or end range assessment for end-feel or joint springing [17]. Palpation involving devices such as pressure algometry were excluded.


We aimed to evaluate clinical outcomes assessed by palpation. Outcomes include pain, segmental mobility and stiffness for static joint palpation; joint movement and position assessed for motion joint palpation; and pain, tenderness, trigger points, muscle contraction assessed for static soft tissue palpation.

Study characteristics

Eligible studies met the following inclusion: 1) English or French language; 2) published in peer reviewed journals between January 1, 2000 to July 11, 2019; 3) assessing the reliability or validity of manual palpation. Previously published systematic reviews on this topic were included in our review. Comparing our systematic review with previous systematic reviews examined findings of studies published before 2000. We excluded: 1) letters, guidelines, editorials, commentaries, unpublished manuscripts, dissertations, reports, book chapters, conference proceedings and abstracts, lectures, addresses, and consensus statements; 2) cadaveric and animal studies; 3) literature reviews and case studies; 4) studies targeting individuals with serious pathology (e.g., fractures, dislocations, systemic disease, myelopathy, neoplasm and infection; and 5) studies with sample size < 20 per group.

Search strategy and data sources

The search strategy was developed in consultation with a health sciences librarian and a second librarian was consulted to ensure accuracy and completeness using the Peer Review of Electronic Search Strategies PRESS checklist [18]. We systematically searched the following electronic databases: MEDLINE, CINAHL, PubMed, Cochrane Central Register of Controlled Trials, and SPORTDiscus. Search terms consisted of subject headings specific to each database (e.g. MeSH in MEDLINE) and free text words relevant to LBP, diagnosis, reliability, validity, and palpation (Additional file 1).

Study selection

Identified citations were exported into EndNote for reference management and tracking of the screening process. We screened articles in two stages. In stage one, titles and abstracts were screened for their relevance by pairs of independent reviewers (NL, PN, ALM). Stage two involved screening the full text article of all possibly relevant citations from stage one. Disagreements on screening stages were discussed between reviewers to reach consensus. When consensus could not be reached, a third reviewer independently screened the citation and discussed with the two reviewers to reach consensus.

Assessment of risk of Bias

Three reviewers (NL, PN, ALM) critically appraised all relevant studies (Tables 1 and 2) using the modified Quality Appraisal Tool for Studies of Diagnostic Reliability (QAREL) [33] criteria to assess the internal validity of the diagnostic reliability studies and the modified Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) [34] criteria to assess diagnostic accuracy/validity studies (Additional files 2 and 3). The original QAREL and QUADAS-2 instruments were modified to include: 1) not applicable options; 2) a question regarding the clarity of the study objective; and 3) the Sackett and Haynes classification (phases of validity studies in QUADAS-2 instrument). If a study was judged as “low” on all domains relating to bias or applicability then it was appropriate to have an overall judgment of “low risk of bias” or “low concern regarding applicability” for that study. If a study was judged “high” or “unclear” on one or more domains then it may be judged “at risk of bias” or as having “concerns regarding applicability” [33, 34]. We included low risk of bias studies in our best evidence synthesis.

Table 1 Risk of bias for scientifically admissible reliability studies based on the modified QAREL criteria
Table 2 Risk of bias for scientifically admissible validity studies based on the modified QUADAS-2 criteria

Validity studies with low risk of bias were classified into one for four phases of investigation following the recommendation of Sackett and Haynes [35]. The purpose of phase I studies is to determine if test results are different for LBP patients and healthy controls. The purpose of Phase I studies is to determine whether test results differ between LBP patients and healthy controls. This information is useful to justify Phase II studies. Phase II studies aim to determine whether patients with a positive palpation result are more likely to have decreased functions, severe disability or structure changes (e.g., spinal stenosis) than patients with a negative result. Phase I and II studies provide preliminary evidence that a test should to be tested in phase III studies. On their own, results from phase I and II studies cannot be used to confirm the validity of tests. However, according to Sackett and Haynes classification, phase I – II justify that a test should be further investigated. Phase III studies aim to determine whether a test result can distinguish between LBP patients with suspected conditions (e.g., radiculopathy). Finally, Phase IV studies aim to determine whether patients who undergo a manual palpation test have a better prognosis than similar patients who were not tested [35]. Phase IV studies are a unique type of studies that differ from phase I-III studies in examining diagnostic accuracy. Low risk of bias of phase IV study would be assessed using the Scottish Intercollegiate Guidelines Network (SIGN) criteria [36].

Data extraction and synthesis of results

One reviewer (PN) extracted data from low risk of bias studies and built evidence tables (Tables 3 and 4); and two reviewers (NL or HY) verified the accuracy and completeness of the data extraction. The reliability and validity studies were stratified according to targeted body structures (joint or soft tissue), technique (static or motion palpation), and clinical outcome (pain provocation, mobility, or stiffness). We used qualitative synthesis to synthesize the best evidence [37]. Eligible statistics include 1) means, median and/or percent in phase I studies; 2) correlations, sensitivity, specificity, positive predictive value, negative predictive value and/or likelihood ratio in phase II or III studies; and 3) prevalence in phase III studies.

Table 3 Evidence table for low risk of bias studies assessing the reliability of manual palpation tests in patients with low back pain
Table 4 Evidence table for low risk of bias studies assessing the validity of manual palpation tests in patient with low back pain

No arbitrary classification was used to report the strength of reliability or validity findings. Such classification used arbitrary cut-points that do not take into account the level of misclassification that can be acceptable in specific context. Rather, values of kappa coefficients, sensitivity, specificity etc. were reported. The authors interpreted the kappa and measurement errors according to clinical settings and purposes of palpation tests in their context. Kappa scores of < 0.6 are considered to have no, minimal or weak agreement and kappa scores of > 0.6 are considered to have moderate, strong or almost perfect agreement [38]. This should be used as a rough guide when interpreting the kappa and measurement errors according to clinical settings and purposes of palpation tests in individual context.

Statistical analyses

We computed kappa coefficients (k) and 95% confidence intervals (CI) to determine the inter-rater reliability of our screening methodology of articles. We computed the percentage agreement between reviewers for the classification of articles into high or low risk of bias.


This review complies with the Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (Additional file 4) [39]. The Statement for Reporting Studies of Diagnostic Accuracy (STARD) was used to inform in the critical appraisal with the QAREL and QUADAS-2 [40].


Study selection

We identified 2307 citations (plus 3 citations from other resources) removed 287 duplicates, and reviewed 2023 articles for eligibility (Fig. 1). In stage 1 screening, 1976 citations were ineligible. Forty-seven papers were reviewed in stage 2, and 31 were excluded: ineligible study population (n = 11) [41,42,43,44,45,46,47,48,49,50,51], inappropriate outcome measure (n = 6) [52,53,54,55,56,57], ineligible publication type (n = 4) [58,59,60,61], ineligible sample size (n = 3) [62,63,64], study design (n = 3) [65,66,67] and did not investigate manual palpation (n = 4) [68,69,70,71]. Two authors were contacted for publication type and age range, both responded [27, 59].

Fig. 1
figure 1

Identification and selection of articles on reliability and validity of manual palpation used to assess patients with low back pain. *Not mutually exclusive

We critically appraised 16 articles and 14 articles had low risk of bias and were included in our evidence synthesis [19,20,21,22,23,24,25,26,27,28,29,30,31,32] (Fig. 1). Over the 16 articles appraised, 14 articles including 17 studies were reported (three articles included both reliability and validity in their study). The inter-rater agreement for screening of articles was Kappa = 0.86 (95% CI 0.73–0.98). The percentage agreement for the admissibility of studies was 100% (17 agreements/17 studies over the 16 articles appraised).

Study characteristics

Fourteen articles had a low risk of bias [19,20,21,22,23,24,25,26,27,28,29,30,31,32]. Of those, 11 reported on the reliability of palpation tests [19,20,21,22,23,24,25,26,27,28,29] and six reported on validity [22, 28,29,30,31,32]. Three articles examined both reliability and validity [22, 28, 29].

The eleven reliability studies with low risk of bias examined inter-rater reliability of manual palpation to assess joints mobility or motion [19,20,21, 23, 26, 27], pain [19, 21, 23,24,25,26, 28, 29] and muscle contraction [22]. Two of the eleven studies also examined intra-rater reliability of manual palpation assessing joint motion [20] and muscle tenderness [24]. The six validity studies included one phase I study on palpation of joints and muscles to assess pain [29], four phase II on palpation of nerves to elicit pain [28], spinal stiffness [31] muscle contraction [22] and sacroiliac joint motion [32] and one phase III study on palpation of gluteal muscle for tenderness and pain [30].

The 14 low risk of bias articles investigated: 1) static joint palpation (n = 7) [19, 21, 23, 25, 26, 29, 31], 2) motion joint palpation (n = 3) [20, 27, 32], and 3) static soft tissue palpation (n = 5) [22, 24, 28,29,30] (Tables 3 and 4). They assessed various techniques: 1) joint pain provocation [19, 21, 23, 26, 29], 2) pain or tenderness of muscles [24, 29, 30], 3) pain and tenderness of nerves [28], 4) joint stiffness/mobility [19, 21, 23, 25, 26, 31], 5) joint motion [20, 27, 32], and 6) isometric muscle contraction [22]. Table 5 showed a glossary of definitions for all of the palpation tests included in the articles.

Table 5 Glossary of Manual Palpation Tests in Accepted Articles

The duration of LBP varied across studies: < 7 weeks (1/14 articles) [21], > 4 weeks (1/14 articles) [24], ≥ l months (1/14 articles) [29], new episode to > 3 months (1/14 articles) [19] and unspecified duration (10/14 articles) [20, 22, 23, 25,26,27,28, 30,31,32]. The studies were conducted in Australia [21], Canada [30], Denmark [24], Iran [20, 32], Ireland [28], and the United States [19, 22, 23, 25,26,27, 29, 31] between 2003 and 2017.

We did not perform a meta-analysis because of the heterogeneity of studies in symptom duration, palpation technique, and outcome specification.

Assessment of risk of Bias

Tables 1 and 2 showed the risk of bias for scientifically admissible reliability and validity studies based on the modified QAREL and QUADAS-2 criteria respectively.

The low risk of bias studies met the following criteria: 1) clearly described objective; 2) representative sample; 3) representative raters; 4) blinding of the test results between raters; 5) appropriate and valid standard test; and 6) appropriate statistical analysis (Tables 1 and 2). However, these studies had the following limitations: 1) unclear time interval between tests (n = 1) [27]; 2) no blinding for intra-examiner reliability (n = 2) [20, 24]; 3) 30 min rest period between the repeat testing between the same examiner and no blinding to clinical information (n = 2) [27, 28]; 4) unclear blinding to clinical information or additional clues [23]; 5) no blinding to clinical information and unclear blinding to additional clues (n=8) [21, 22, 23, 24, 25, 27, 28, 29] and 6) non-random or unclear administration of tests (n = 5) [19, 23, 27,28,29]. Most validity studies had appropriate exclusion criteria and blinding. However, validity studies had limitations: 1) four studies did not use a consecutive or random sample [22, 29, 31, 32]; and 2) two studies were unclear as to whether an appropriate time interval between tests were used [28, 31]; 3) one study was unclear as to whether an appropriate reference standard (slump test and straight leg raise) was used [28]; 4) in one study the examiner was not blinded to the results of the index or reference test [32] and 5) in one study it was unclear as to whether all patients were included in the analysis [22].

Two validity studies were excluded after critical appraisal. Abbott et al. used flexion/extension radiographs as a reference standard without establishing the test-retest reliability of patient positioning when taking of the radiographs [72]. Telli et al. didn’t use blinding in their reliability study [73].

Summary of evidence

Reliability of joint and bony structure palpation

Static palpation

Four studies investigated static palpation to elicit pain. Overall, these studies suggest that important measurement error is associated with eliciting pain from: 1) lumbar facet joints (inter-rater reliability 0.38 ≤ k ≤ 0.73); 2) lumbar spinous processes (inter-rater reliability 0.21 ≤ k ≤ 0.57); 3) sacro-iliac (SI) joints (inter-rater reliability 0.14 ≤ k ≤ 0.59) [19, 23, 26, 29] (Table 3). Similarly, the evidence suggests that static palpation used to identify joint segmental mobility has low inter-rater reliability (i.e., lumbar facet joints: − 0.17 ≤ k  0.17; and lumbar spinous processes; − 0.02  k  0.26 SI joints: − 0.11 ≤ k ≤ − 0.10) [19, 23, 26]. The inter-rater reliability of the prone instability test for pain ranged from a kappa of 0.30 [23], 0.41 [19] and 0.54 [26] in the relaxation phase of the test and a kappa of 0.46 [26], 0.71 [19] and 0.87 [23] in the contraction phase of the test. In a study that combined the two phases of the test into a positive or negative finding reported a kappa of 0.10 [25] (Table 3). Furthermore, a third study by Downey et al. (2003) reported low inter-rater reliability of joint static palpation to locate the spinal level (0.23 ≤ k ≤ 0.54) and name the spinal level (− 0.13 ≤ k ≤ 0.41) in patients with LBP symptoms [21] (Table 3).

Motion palpation

We found inconsistent evidence in support of the reliability of motion palpation of the lumbar spine and SI joints to assess joint motion [20, 27]. The inter-rater reliability of motion palpation of the sacroiliac joint varied (inter-rater reliability 0.14 ≤ k ≤ 0.75 and intra-rater reliability 0.23 ≤ k ≤ 0.73) (Table 3) [20, 27]. Tong et al. (2006) suggested that sacral position cannot be reliably assessed during trunk motion using sacral base position test (inter-rater reliability: flexion k = 0.37, extension k = 0.05) [27].

Reliability of soft tissue palpation

Static palpation

We found varying levels of reliability for the palpation of the soft tissue structures associated with low back pain [22, 24, 28, 29]. The inter-rater reliability ranged from k = 0.80 for sciatic nerve pain, to 0.51 ≤ k ≤ 0.68 for gluteal tender points and k = 0.34 for lumbar paraspinal muscle pain [24, 28, 29]. One study suggested that the multifidus muscle can be reliably assessed by examiners who believe they are palpating the multifidus muscle for abnormal isometric contraction by palpating lateral and adjacent to the interspinous space of L4-L5 and L5-S1 with contralateral arm raising both with and without using hand weights (inter-rater reliability 0.75 ≤ k ≤ 0.81) [22]. It is possible that the multifidus lift test is also palpating a more superficial muscle which raises questions about the validity of this test.

Validity of joint and bony structure palpation

Static palpation

Two studies investigated the validity of static joint palpation [29, 31]. One phase I study found that pain elicited by palpation of the SI joints and lumbar spinous processes was more common in LBP patients compared to healthy controls [29]. One phase II study reported that posterior to anterior palpation used to identify stiffness from L1-L5 had a sensitivity of 38% (95% CI 21–59%), a specificity of 45% (95% CI 28–62%), a positive likelihood ratio of 0.69 (95% CI 0.37–1.31) and a negative likelihood ratio of 1.38 (95% CI 0.82, 2.33) when compared to a mechanized indentation device [31] (Table 4).

Motion palpation

One phase II study investigated the validity of joint motion palpation tests for the sacroiliac joints [32]. They examined the relationship between sacroiliac tests for joint motion (Gillet test, sitting flexion test and standing flexion test) and sacroiliac pain provocation tests (Faber test, thigh thrust test and resisted abduction test) but did not use statistics for validity (Table 4).

Validity of soft tissue palpation

Static palpation

Four studies investigated the validity of static soft tissue palpation [22, 28,29,30]. One phase I study found that pain elicited by palpation of the lumbar paraspinal and piriformis muscles was more common in LBP patients compared to without LBP [29]. A phase II study tested the validity of the multifidus lift test with and without hand weights to identify abnormal isometric multifidus muscle contraction when compared to measurement with real-time ultrasound imaging of lumbar multifidus muscle thickness [22] (Table 4). The authors reported that the multifidus lift test correlates with ultrasound finding at the L4–5 level (r biserial correlation coefficient: 0.59 without hand weight and 0.73 without hand weight) and weakly associated at the L5-S1 level (r biserial correlation coefficient: 0.17 and 0.47) (Table 4) [31]. Another phase II study investigated the validity of sciatic nerve palpation between the ischial tuberosity and the greater trochanter for pain using the straight leg raise and slump test as reference standard to evaluate mechanosensitivity of the sciatic nerve [28]. The authors found that sciatic nerve palpation had a sensitivity of 85% (95% CI, 75–95%) and a specificity of 60% (95% CI, 46–74%) [26]. Finally, one phase III study investigated the validity of static palpation of gluteal muscle for taut band, tenderness and pain recognition compared to an expert panel confirmation of radicular LBP (informed by MRI and electro-diagnostic testing). The authors reported that static palpation of the gluteal muscle had a sensitivity of 74.1% (95% CI, 67.7–80.3%) and a specificity of 91.4% (95% CI, 86.8–96.0%) in identifying radicular pain [30].


Summary of results

We reviewed the reliability and validity of manual palpation used to assess patients with LBP. We retrieved eleven studies on the reliability of static and motion palpation of joint and soft tissue. Overall, the evidence suggest that static joint palpation is not reliable in identifying pain and segmental mobility of the lumbar facet joints, lumbar spinous processes and SI joints, and location of spinal level contributing LBP symptoms. However, static soft tissue palpation may help reliably identify gluteal tender points, sciatic nerve pain, and multifidus contraction but not lumbar paraspinal muscle pain. We identified six validity studies for the assessment of LBP using static joint, joint motion and soft tissue palpation. Gluteal muscle palpation for pain was able to help identify differentiate LBP patients with or without radiculopathy (phase III study). We found preliminary evidence for the validity of the piriformis and lumbar paraspinal muscle palpation for pain (phase I study), spinous and sacroiliac joint palpation for pain (phase I study), sciatic nerve palpation for pain to identify mechanosensitivity of the sciatic nerve as determined by the straight leg raise and slump test (phase II study) and the multifidus lift test to help identify abnormal isometric contraction (phase II study); and against posterior to anterior palpation used to identify stiffness from L1-L5 spine levels (phase II study). Sacroiliac joint motion tests were not associated with sacroiliac pain provocation tests (phase II study). Overall, very little knowledge is available to support the usefulness of palpation of the lumbar and sacroiliac test when examining patient with low back pain.

Comparison with previous systematic reviews

The results of our systematic review differ from previous systematic reviews [9, 11, 13]. Our finding that static joint palpation of the spinous processes, facet and sacroiliac joints is not reliable to identify pain disagrees with previous systematic reviews [9, 11, 13]. Three reviews reported that the reliability of static joint palpation for pain was acceptable, but the kappa used to make this conclusion is low (k ≥ 0.4) [9, 11, 13]. Our review disagrees with the previous finding by Stochkendahl et al. et al. that found that static soft tissue palpation may help reliably identify soft tissue pain (k ≤ 0.4) [11]. Our review found inconsistent reliability to identify soft tissue pain with the inclusion of three recent studies [22, 24, 28]. The different conclusions may be due to different search strategies, new evidence, inclusion of small sample studies, use of self-developed checklists, or use of predefined cut-off points to differentiate low and high quality studies in the four systematic reviews. However, our results are consistent with a systematic review published in 2020 focusing only on segmental motion palpation [74]. Poor evidence regarding reliability and validity of segmental motion testing were reported and clinical use of stand-alone tests cannot be recommended [74].

Strengths and limitations

Our systematic review has several strengths. First, our comprehensive search strategy of multiple databases was developed by a health sciences librarian in consultation with content experts and was then reviewed by an independent health sciences librarian using the PRESS Checklist [18]. Second, we used detailed, predefined inclusion and exclusion criteria to capture a diffuse range of possibly relevant citations. Third, we used paired independent reviewers to screen and critically appraise citations to minimize bias and error. The critical appraisal was completed by trained reviewers using standardized quality assessment tools (QAREL/QUADAS-2). Fourth, bias in reported results was minimized by performing a best-evidence synthesis that included only high-quality studies. Finally, we only included studies that tested subjects with LBP. This makes our results more generalizable to the patients seen by practitioners in clinical practice.

Our review also had limitations. First, our search was limited to studies published in English and French languages. It is possible that relevant studies in other languages may have been excluded. Second, our search may not have retrieved all relevant studies, although our search strategy was comprehensive and the search was conducted in multiple major medical databases. Third, our search was limited to studies published after 2000. Fourth, it is possible that individual differences in scientific judgment could have resulted in varied critical appraisal outcomes among reviewers. This bias was minimized using training with the standardized assessment tools and a consensus process for determining internal validity of studies. Finally, studies examining motion palpation tests had smaller sample sizes (validity studies n = 50; reliability studies n = 49) than studies of static joint or muscle palpation. This may have limited the precision of the results and led to uncertainty in our assessment of motion palpation tests.

Clinical implications

Our review found very little evidence for the use of manual palpation to assess low back pain patients. Manual palpation tests suffered from misclassification error in that they were unable to differentiate those with LBP to subjects without LBP. Soft tissue palpation of the sciatic nerve, gluteal muscles for pain and the multifidus muscle for isometric contraction were reliable but have not been tested sufficiently for their validity for use in clinical practice. Although we did find that gluteal muscle palpation of trigger points and taut bands is valid to differentiate LBP patients with or without radiculopathy in a clinical setting. We found very limited evidence to support the use of joint palpation and clinician should reconsider its diagnostic value when assessing patients with low back pain.


We synthesize the evidence on the reliability and validity of manual palpation to assess adults with LBP. The evidence does not support reliability of joint palpation but static soft tissue palpation is reliable. There is little evidence on the motion joint palpation used in LBP patients. Gluteal muscle palpation for pain was able to differentiate LBP patients with or without radiculopathy (phase III study). We found preliminary evidence from Phases I and II validity studies for some palpation tests. High quality phase III and IV validity studies are required to understand the diagnostic value of manual palpation tests in the assessment of adults with LBP. Clinicians must reconsider the usefulness of these tests when examining patients.