Background

The diagnostic value of many physical tests in orthopedic practice has been called into question and a number of these tests have been found to correspond poorly with anatomical models [1, 2]. In some cases, clinicians proceed directly to more invasive or technologically-involved ‘definitive’ investigations, however this is not always desirable, practical or economical [3]. For example, the more direct approach has been blamed for diagnostic delays and misclassification of hip joint pathologies [4].

Recently, several diagnostic reviews of physical tests of the hip have been published [58] and they generally support the view that most studies are of low to moderate quality. Three of these reviews examined labral pathologies and/or femoroacetabular impingement [5, 6, 8] while a fourth looked at a wider range of pathologies [7]. This systematic review aims to build on these reviews by assessing a broad range of hip pathologies, and employing a more selective approach to the inclusion of studies in order to accurately gauge diagnostic performance for the purposes of making recommendations for clinical practice and future research. We aim to determine:

  1. i)

    which physical tests of the hip or physical clinical prediction rules have valid evidence from which their diagnostic performance in clinical practice can be calculated; and

  2. ii)

    whether any physical tests or clinical prediction rules have strong diagnostic utility; and

  3. iii)

    whether any physical tests or clinical prediction rules have moderate diagnostic utility.

Methods

In this systematic review, a preliminary search of various textbooks, medical journal databases, websites and grey literature sources was conducted to identify physical tests of the hip. Subsequently, an electronic database search strategy was developed, aided by a medical librarian (see Additional file 1), and applied to Medline (1950-July 2010), Embase (1980-July 2010), Embase Classic (1947–1979) and the Cumulative Index to Nursing and Allied Health Literature (CINAHL) (1982-July 2010). A follow up search was performed in March 2013 using Medline, Embase and CINAHL to identify studies published in the interim period following the original search (see Additional file 1).

Studies included in our review were required to:

  1. i)

    compare a physical (index) test for the diagnosis of a particular hip pathology against a ‘gold standard’ (reference) test representing the true diagnostic result. Physical tests were defined as non-invasive bedside maneuvers, beyond inspection, point tenderness and palpation alone, which were intended to increase the probability of a particular diagnosis; and

  2. ii)

    report sufficient information to construct complete 2×2 contingency tables; and

  3. iii)

    recruit predominantly adult populations (where ages were indicated); and

  4. iv)

    be written in English.

Studies were excluded if they:

  1. i)

    used physical tests under anesthesia or intra-operatively; or

  2. ii)

    used physical tests to diagnose vascular or neurologic pathologies.

    Studies were also excluded if they did not meet our criteria for internally and externally valid methodology. These criteria are listed below.

  3. iii)

    For the purposes of internal validity, reference tests could not: (1) be dependent upon the index test result for interpretation, (2) be discredited for diagnosing the chosen pathology, or (3) allow for only partial construction of 2×2 contingency tables (e.g. by excluding persons with negative index test results from the study).

  4. iv)

    For the purposes of external validity, (1) the sample population had to reasonably represent a typical population presenting for diagnosis in clinical practice (e.g. they could not use healthy or asymptomatic controls who had no indications for testing), and (2) the index test needed to provide a threshold for dichotomizing results.

Assessments of validity were made independently by two authors and disputes arbitrated by a third author. No further restrictions were placed on study design, date of publication or clinical setting.

For the literature search in 2010, one author screened citations for inclusion on the basis of their title. The remaining citations were assessed independently by two authors, first by title and abstract and then by full text. Opposing views regarding inclusion were resolved by arbitration with the remaining authors. When new tests were identified, new search strategies were executed for them using Medline, Embase and Embase Classic (see Additional file 1). The follow up literature search and sorting process in March 2013 were conducted entirely by a single author.

The diagnostic performances of included physical tests are presented in terms of sensitivity, specificity, predictive values and likelihood ratios (LRs) with the latter being used to further identify tests demonstrating “strong” and “moderate” diagnostic utility. We favor the use of likelihood ratios because they offer the most valuable and comprehensive diagnostic information in the individual patient [9, 10]. Roughly speaking, tests with positive LRs greater than or equal to 10 or negative LRs less than or equal to 0.1 will cause almost conclusive, “strong” changes in post-test probability of disease. Positive LRs between 5 and 9.99 and negative LRs between 0.11 and 0.2 cause “moderate” changes in post-test probability [9]. In order to limit the uncertainty caused by studies recruiting small sample populations, we required “strong” tests to meet our likelihood ratio criteria within their entire 95% confidence intervals (otherwise the test was classified as “moderate”). When diagnostic data was only presented in the form of percentages or fractions, we attempted to revert it back to integer form to determine the original population numbers in each diagnostic category of a 2x2 contingency table. We only pooled data from studies involving the exact same index test and target pathology.

Results

Only a small proportion of hip tests identified in our preliminary search had their diagnostic performance assessed in methodologically valid primary studies. We identified sixteen studies containing data that satisfied our inclusion and exclusion criteria [1126] (Figure 1). This produced a total of 56 independent test-pathology combinations (Additional file 2).

Figure 1
figure 1

Flow diagram of study inclusions and exclusions.

Two physical tests demonstrated strong diagnostic utility with the patellar-pubic percussion (PPP) test strongly excluding radiologically occult hip fractures (negative LR 0.05, 95% CI 0.03-0.08) [26], and the hip abduction sign strongly diagnosing sarcoglycanopathies in patients with known muscular dystrophies (positive LR 34.29, 95% CI 10.97-122.30) [20] (Table 1). The original description of these tests from the primary studies can be found in Additional file 2.

Table 1 Diagnostic performances of independent physical test-hip pathology combinations with strong clinical diagnostic utility a

Fifteen independent test-pathology combinations demonstrated, at most, moderate diagnostic utility (Table 2). These included five tests for diagnosing symptomatic osteoarthritis [25], seven tests for diagnosing loosening of various components post-total hip arthroplasty [23] and three tests for diagnosing and excluding various hip fractures [11, 13, 24].

Table 2 Diagnostic performances of independent physical test-hip pathology combinations with moderate clinical diagnostic utility a

Discussion

Previous reviews of physical tests have found much of the existing literature to be methodologically flawed and insufficient for guiding clinical practice. This review sought to identify clinically useful physical tests or combinations of tests that demonstrated strong and moderate diagnostic performance. This information could potentially be used to form future clinical prediction rules or guide future research. We found the PPP test strongly excluded radiologically occult hip fractures and the hip abduction sign strongly diagnosed sarcoglycanopathies in patients with known muscular dystrophies. In addition, we identified a number of tests with moderate usefulness for diagnosing and/or excluding hip fractures, symptomatic osteoarthritis and loosening of components post-THA.

While some of our results are promising at face value, the raw data needs to be considered in more detail.

Firstly, it is possible that we have overstated the utility of the PPP test since we have based our conclusions primarily on a single study by Tiru et al. [26]. Two other studies recruiting smaller populations [11, 13] also employed the principle of osteophony when testing for hip fractures and found only moderate diagnostic utility. We did not pool the data from these studies they tested for radiologically apparent fractures, and the Bartford test employed by Bache and Cross [13] auscultated for sound transmitted by a tuning fork rather than percussion.

The hip abduction sign may also not perform as strongly as we suggested because Khadilkar and Singh [20] relied on retrospective testing of patients with known diagnoses of variable duration and severity. It is therefore possible that some of the recruited sample population may not have reflected clinical practice. Khadilkar and Singh’s [20] findings need to be confirmed prospectively in a pre-diagnosis setting.

There was significant uncertainty about the true diagnostic performance of some of the moderately useful physical tests because of the small sample populations recruited in the primary studies [11, 13, 2426]. We suggest further testing with large sample populations would be of benefit to better assess if these tests should be considered for inclusion in future clinical prediction rules.

While we acknowledge that previous hip test reviews have found much of the literature to be methodologically flawed, we did not use cumulatively-scored quality assessment tools to analyze our data as the implications of these numerical values are not clear [27]. Instead, we used our methodological validity criteria to provide a minimum standard to serve our primary purpose, which was to identify tests with strong and moderate diagnostic performance for use in clinical practice. Although our criteria are generally consistent with quality assessment tools and have been empirically associated with design-related bias [28], we acknowledge that this does not eliminate all bias and that there remain significant shortcomings in the literature. We believe our criteria represent a reasonable compromise for the sake of drawing basic conclusions. That said, since our criteria have not been independently validated, we have reported data from excluded studies in Additional file 3 when complete 2×2 contingency tables could be formed and Additional file 4 for the remaining studies and case reports. There were some discrepancies between this review and those that have been previously published. In some instances this was explained by calculation errors and in others this was because we found there was insufficient information in the primary study to construct 2×2 contingency tables for calculation of diagnostic performance.

Conclusions

There is valid evidence for the diagnostic performance of only a small proportion of physical tests of the hip in routine clinical practice. Two tests demonstrated strong diagnostic utility, the patellar-pubic percussion test for excluding radiologically occult hip fractures and the hip abduction sign for diagnosing sarcoglycanopathies in patients with known muscular dystrophies. In addition, we identified a number of tests with moderate usefulness for diagnosing and/or excluding hip fractures, symptomatic osteoarthritis and loosening of components post-THA. The primary studies from which our data are derived contain methodological flaws that bias their results. Future studies should recruit larger and more representative populations and allow for construction of complete 2×2 contingency tables.