Introduction

Achieving bone union is the main goal in patients after a fracture, osteotomy, or arthrodesis. But when has bone healed? This is a simple question, but the answer is rather complicated.

In the clinic, bone union is generally assessed based on conventional radiographs and on clinical examination, such as response to weight bearing or palpation of the fracture [1]. However, assessing bone union is a rather subjective decision [2], and the lack of consensus has been extensively described by several studies [3, 4].

Assessment of bone union after a fracture, arthrodesis, or osteotomy is an important clinical consideration. Wrong assessment of bone healing can have major negative consequences for a patient. By overestimating the amount of bone healing, a bone might be loaded too early resulting in a displaced fracture or failure of osteosynthesis material. Underestimating bone healing may cause unnecessary immobilization resulting in stiffness, decreased muscle mass and function, and productivity loss of the patient [5, 6]. Especially if bone union is doubtful, an objective and accurate assessment tool can be helpful in clinical decision-making. Also, for scientific purposes, an objective and accurate method of fusion assessment would be of high value. Being able to accurately assess bone union would have several advantages like a decreased risk of biases within studies and less patients needed in clinical trials with bone union as primary outcome. Additionally, it would become easier to compare results between studies. In orthopedic studies, bone union is a commonly used primary outcome, for instance in studies investigating bone healing stimulating therapies after a fracture, osteotomy, or arthrodesis [7,8,9]. For the objective assessment of bone healing from radiographs, the radiographic union score (RUS) has been introduced in 2012 [10, 11]. Ever since, this semi-quantitative assessment tool for assessment of fracture healing has become increasingly popular as an outcome measure in clinical studies [12, 13]. However, computed tomography (CT) is the best method to image bone and has been shown to be superior to plain radiographs, MRI, and DEXA to assess bone union [14,15,16,17]. For CT, no golden standard exists for the objective assessment of bone union as an outcome measure. Therefore, we would like to create a method to objectively assess bone union from CT. This could then be used as golden standard for bone union assessment in clinical studies, but could also be used in the clinic if bone fusion after fracture, arthrodesis, or osteotomy is doubtful.

To establish an objective clinically applicable tool for bone union assessment, we need to know which CT-generated outcomes have a strong association with actual bone union. This review will therefore investigate which CT parameters are associated with actual bone healing. Actual bone union will be tested by mechanical or histological tests. As it is unethical and therefore impossible to acquire this data in clinical studies, in this review, we focus on animal studies. The aim of this review is to find CT parameters that best represent actual bone union, which is indicated by mechanical or histological testing.

Method

The protocol of this review has been prospectively registered at the International prospective register of systematic reviews (http://www.crd.york.ac.uk/prospero/; registration number CRD42020164733).

To find all studies concerning the assessment of bone union with CT, an online search was performed on February 5, 2020. Five online databases were searched (Embase.com, Medline Ovid, Web of science, Cochrane CENTRAL, and Google Scholar). The search strategy for Medline Ovid is presented in Table 1 and was adapted for the other databases. Following the selection of eligible articles, reference lists of eligible articles were checked for missed articles.

Table 1 Search strategy for Medline Ovid

After the search of the databases, eligible articles were selected, by two authors (AW and CI), based on predefined eligibility criteria (Table 2). Overall, we included studies that created a fracture in the appendicular skeleton of an animal. A fracture was defined as a bone gap that was created by performing an osteotomy or by impact loading. Studies with distraction osteogenesis or bony defects were excluded. Bony defects were defined as drilling a hole in a bone. After at least 4 weeks, CT should be performed to assess bone union. The time period of 4 weeks was chosen because we aim to look at more advanced fracture healing and are not interested in the very early stages of bone healing. Simultaneously with CT, actual bone union should be tested by mechanical or histological testing. Parameters that are obtained from mechanical or histological testing and reflect bone union could be, for instance, bone mineral density, bone volume, or cross-sectional area. The association between CT outcomes and mechanical or histological outcomes should thereafter be statistically examined.

Table 2 In- and ex-clusion criteria

Firstly, based on the predefined in- and ex-clusion criteria, the eligibility of studies was assessed by reading title and abstract. Secondly, both authors read the full text of the pre-selected studies and assessed eligibility. After the first and second round, the study selection of both authors was compared. In case of disagreements, a third reviewer decided (DM).

Data were extracted from eligible studies using a predefined data extraction sheet. Data extraction was performed by one reviewer (AW) and checked by a second reviewer (CI). Disagreements were resolved by reaching consensus. Data that were extracted from the studies were data related to the methodology of the studies (fracture site, number of animals, animal species, use of bone growth stimulating injections, time till CT, type of CT, CT settings, volume of interest, threshold for bone, performance of histological testing and mechanical testing, mechanical test that was performed), outcome measures (outcomes of mechanical or histological testing, and outcomes of CT), and statistical associations between CT-outcomes and mechanical or histological testing.

Risk of bias assessment was done with the QUADAS-2 tool [18], which is a tool for diagnostic studies. Although the tool was originally designed for human studies, we chose this tool because it is the best available tool to assess risk of bias for studies in this review. The risk of bias assessment was done by two authors (AW and CI), and discrepancies were resolved by reaching consensus.

The primary outcome of this systematic review will be the strength of the associations between CT-assessed outcomes and mechanical or histological tested bone union. These associations can be expressed as Pearson’s correlation coefficients, coefficients of determination, or strength of association in a regression model. To improve readability of this review, all linear Pearson’s correlation coefficients will be squared, resulting in coefficients of determination. To distinguish between weak and strong relations, coefficients of determination will be classified as weak (R2 < 0.4), moderate (R2 = 0.4–0.7), and strong (R2 > 0.7).

Results

The search initially resulted in 5159 studies. After removing the duplicates, 2567 studies were screened on title and abstract, resulting in 38 potentially eligible studies. After reading the full-text of those studies, thirteen studies were included in our systematic review (Fig. 1).

Fig. 1
figure 1

Flow chart of study selection

The results of the risk of bias assessment with the QUADAS-2 tool are presented in Table 3. The assessment showed that risk of bias is generally low in the domains ‘animal selection’ and ‘flow and timing.’ However, twelve studies did not clearly describe whether results of the index test (CT) were interpreted without the knowledge of the results of the reference test (mechanical or histological testing) and vice versa. Therefore, the risk of bias concerning these domains is unclear.

Table 3 Risk of bias assessment with the QUADAS-2 tool

General Study Characteristics

The studies created a fracture by performing an osteotomy (eight times) [17, 19,20,21,22,23,24,25] or by impact loading (five times) [26,27,28,29,30]. Six studies created the fracture in the femur [19, 22, 26, 27, 29, 30], six in the tibia [17, 20, 21, 24, 25, 28], and one in the metatarsus [23]. During follow-up, eight studies used micro-CT for the assessment of fracture healing [19, 22, 24,25,26,27, 29, 30], two studies peripheral quantitative CT [20, 21], and three studies (quantitative) clinical CT [17, 23, 28]. All studies performed mechanical testing, such as torsional tests [17, 19, 22, 24, 27,28,29,30], three-point bending tests [20, 23, 26], or axial tests [17, 21]. Two studies also performed histological testing [20], but one of those did not correlate the outcomes to CT outcomes [17]. See Table 4 for animal species that were used and more study characteristics.

Table 4 General study characteristics

Linear relations between CT parameters and mechanical or histological outcomes were tested by performing Pearson’s correlation [21, 22, 25, 26, 29, 30], bivariate linear regression [17, 19, 20, 23, 24, 28], or multiple regression analysis [27, 29]. Böhm and Jungkunz (1999) also performed bivariate quadratic regression analysis [23].

Parameters Generated with CT Representing Bone Union

Quantitative CT Parameters

Quantitative CT parameters that represent bone union are for example bone mineral density (BMD) and total volume of the callus (TV). Studies created volumes of interests (VOI) around the fracture, in which quantitative CT parameters were assessed. Table 5 shows the volumes of interests, bone thresholds, and outcome measures that were reported from CT. Also, it shows the parameters assessed from mechanical and histological testing.

Table 5 Outcome measures on axial CT cross-sections

Biomechanical CT Parameters

Three studies calculated the polar moment of inertia from CT [23, 27, 30]. Polar moment of inertia represents the resistance of bone to torsion and is dependent on the shape of the callus relative to the torsion axis. Polar moment of inertia is expressed as m4.

Three studies calculated torsional rigidity (GJ) of the fracture, based on CT-derived data [19, 24, 30]. GJ is a measure describing resistance of a bone when it is subjected to torsional forces and is expressed as Nm2. GJ is calculated from the cross-sectional area and CT-assessed bone mineral density. GJ was presented as an average of the entire VOI (GJAVG) [19, 24, 30] and as the weakest slice of the VOI (GJMIN) [19, 24]. Shefelbine et al. (2005) [30] also calculated the average bending rigidity.

Associations Between CT and Mechanical or Histological Testing

Quantitative CT Outcomes with Mechanical Testing

The included studies used several quantitative parameters assessed from CT to represent bone union. The results of the studies are shown in Tables 6 and 7.

Table 6 Coefficients of determination for linear associations between CT-outcome measures and mechanical or histological testing outcomes
Table 7 Coefficients of determination for quadratic associations between CT-outcome measures and mechanical or histological testing outcomes

Ten studies correlated bone mineral density (BMD) to mechanical outcome. Six studies did not find associations with R2 > 0.40 between BMD and mechanical outcomes [21, 22, 24, 26, 29, 30]. Four studies found moderate to strong associations with BMD [17, 20, 23, 25]. Böhm and Jungkunz (1999) also found strong associations for a quadratic association between BMD and mechanical testing [23].

Callus density (CD) was assessed by two studies, which both reported strong associations between CD and mechanical testing [23, 28].

Tissue mineral density (TMD) was assessed by two studies [24, 29]. One study reported weak associations [29], whereas the other study found moderate associations between TMD and mechanical testing [24].

For bone mineral content (BMC), two studies did not find associations with R2 > 0.40 [20, 21]. One study reported a strong linear and quadratic association for BMC with mechanical testing [23].

Total callus volume (TV) was assessed by five studies. Three studies reported no or weak associations between TV and mechanical outcomes [25, 26, 29]. Two studies reported moderate associations with mechanical outcomes [22, 24].

Mineralized callus volume (BV) was assessed by six studies. Three studies reported no or weak associations for BV with mechanical outcomes [26, 29, 30]. Three studies reported moderate to strong associations between BV and mechanical outcomes [22, 24, 25].

The mineralized fraction of the callus (BV/TV) was assessed by four studies [22, 24, 26, 29], of which one study found a moderate association [26].

Cross-sectional area (CSA) was assessed by three studies and was not associated with mechanical outcomes [20, 21, 30].

Some studies investigated less common CT-outcome parameters [24, 26,27,28,29]. From these parameters, associations with mechanical outcomes with R2 > 0.50 were found for trabecular thickness [24] and amount of bone across the failure surface area [24].

Morgan et al. (2009) and Mehta, Heyland, Toben, and Duda (2013) created regression models to associate mechanical outcomes to quantitative CT parameters [27, 29]. For maximum torque, a model with TMD, BMC, and σTMD explained 62% of the variation (R2 = 0.62), and a model with TMD, BV, and σTMD explained 61% (R2 = 0.61) [27]. For torsional rigidity, a model with TMD, BMC, BV/TV, and σTMD explained 70% of the variation (R2 = 0.70) [27]. Torsional stiffness could be predicted with a model containing strut thickness, the standard deviation of the strut separation, and strut number (R2 = 0.55). Torsional strength could be predicted with BMD or BV/TV, strut thickness, standard deviation, or strut separation (R2 = 0.57).

Quantitative CT Outcomes with Histological Testing

Augat et al. (1997) was the only study who correlated CT-outcomes to histological outcomes. They reported a moderate association (R2 = 0.62) between minimal BMD and histologically assessed percentage bone in periosteal callus. A strong association (R2 = 0.71) was reported between the minimal BMD and histologically assessed percentage bone in fracture gap.

Biomechanical CT Outcomes with Mechanical Testing

Polar moment of inertia was assessed by three studies. Two studies found no or weak associations between moment of inertia and mechanical outcome [26, 30]. Böhm and Jungkunz (1999) reported moderate linear and quadratic associations between polar moment of inertia and mechanical testing [23].

Three studies associated CT-assessed torsional rigidity to torsional rigidity assessed by mechanical testing [19, 24, 30]. All three studies reported moderate to strong associations between the average torsional rigidity and mechanical testing results [19, 24, 30].

Shefelbine et al. (2005) reported moderate associations between CT-assessed maximum and mean bending rigidity and mechanical outcomes [30].

Data Synthesis

Overall, for two parameters, all studies investigating these parameters found moderate or strong associations. These parameters were CD, which was assessed by two studies, and CT-assessed torsional rigidity, which was assessed by three studies. For BMD, TMD, BMC, TV, BV, trabecular thickness, and polar moment of inertia, 30–60% of the studies investigating these parameters found associations. For BV/TV, CSA, trabecular number, trabecular separation, and bone area per total area, less than 30% of the studies found an association for these parameters.

Some parameters were only assessed by one study. From those, CT-assessed bending rigidity and amount of bone across the failure surface area showed moderate to strong associations.

Discussion

We aimed to identify CT-outcome parameters which are associated most strongly with bone union after a fracture. The associations found by the studies are conflicting, with exception for CT-assessed torsional and bending rigidity, and callus density.

CT-assessed torsional rigidity was found to have moderate to strong associations by all three studies that investigated it. Torsional rigidity is calculated from CT-acquired data and is dependent on the callus density, cross-sectional area, and the distribution of bone density within the callus [19, 30]. Based on CT, virtual models of the bone are created on which virtual mechanical testing can be performed. From this virtual testing, torsional rigidity is calculated [19, 30, 31]. Average torsional rigidity showed moderate associations with mechanical tests in all three studies [19, 24, 30]. The results of Naziarian et al. (2010) [19] showed that minimum torsional rigidity had a stronger association with mechanical testing than average torsional rigidity. This means that analyzing only the weakest segment (axial slice) of CT images would give the strongest associations. This seems logical, as failure of a beam under forces is dependent on the weakest point, and not the average strength [19]. However, Wright et al. (2012) [24] did not find an association between minimum torsional rigidity and mechanical testing. According to Wright, the use of the tibia, and not the femur as Nazarian did, might explain this [24]. In contrast to the femur, the diameter of the tibia decreases when going more distally. As torsional rigidity is dependent on the CSA, the minimum torsional rigidity might therefore move to the most distal part of the VOI when analyzing the tibia [24]. This once more indicates that the assessment of fracture healing is complex and dependent on many variables.

This complexity may have led to the conflicting results of the other parameters. For example, quite strong association were reported for BMD by three studies, whereas other studies found no associations with BMD. Because of the conflicting results between studies, the generalizability of the associations seems to be quite low. Also, most studies in this review explored linear relations, but Böhm and Jungkunz (1999) showed that associations might be quadratic [23]. However, Böhm and Jungkunz (1999) was the only study investigating quadratic associations and it was a small study (n = 12).

So far, CT-assessed torsional rigidity seems a promising parameter for bone union assessment. Clinically, several studies have been investigating CT-assessed torsional rigidity. CT-assessed torsional rigidity was successfully used for the prediction of fractures in patients with bone lesions [32,33,34]. Also, recently, the first clinical study has been published that used CT-assessed torsional rigidity to assess tibial fracture healing [35]. In this study, a low-dose CT was made of the tibia 12 weeks after surgical fixation. Software was used to create a virtual model of the fractured tibia which was adapted to a model of an intact tibia. Virtual torsional testing could then be performed on these models, resulting in torsional rigidity values for the fractured and intact tibia. Lastly, torsional rigidity of the fractured model was divided by the torsional rigidity of the intact model. By doing this, a dimensionless parameter was created which indicates the progression of healing relative to the intact tibia [31]. Given the results of this review, and the promising results of the first clinical study, CT-assessed torsional rigidity could become a useful tool for bone union assessment. However, at this moment, the clinical applicability of CT-assessed torsional rigidity is limited. Advanced software and knowledge are needed to conduct CT-based structural rigidity analysis (CTRA) [32]. Although CTRA can be done with data from any CT-scanner, bone densities are very important for the analysis. Therefore, phantoms with known bone densities should be scanned with the patient [32].

This systematic review encountered some limitations. Firstly, CT-assessed torsional rigidity and callus density were only assessed by a limited number of studies (three studies for CT-assessed torsional rigidity and two studies for callus density). Although those studies show promising results, more studies should be done to further confirm these results. Parameters that were assessed by more than three studies had higher chances of finding contradictory results. However, the more investigated parameters in this review showed no significant associations in most of the studies. BMD for instance was investigated by ten studies, of which only four reported significant associations. A second limitation of this review is that the statistical associations that are presented come from animal studies. We should be careful by translating these results directly to clinical human fractures, as data retrieved from animal studies might be unreliable in clinical studies [10]. For example, studies have shown that rodent bone remodeling is different from large animal or human bone remodeling because it is lacking intracortical remodeling [36]. Therefore, associations for bone healing might be different for rodents compared to large animals or humans. Also, most studies in this review used micro CT-scanners with higher spatial resolutions and higher radiation doses than clinical CT-scanners [19, 21, 26, 27]. Therefore, clinical CT-scanners might be less accurate than micro CT-scanners [37]. Thirdly, for this systematic review, we used fairly strict inclusion criteria. The main reason for these strict criteria was to keep heterogeneity between studies as low as possible to be able to compare studies and therewith draw a firm conclusion. Even with these strict criteria, the heterogeneity between studies was high. Studies used different location of fractures, animal species, scanning protocols, and mechanical testing protocols, which is likely to affect the associations found between the studies. Also, four studies used drug treatments to increase fracture healing [19, 26,27,28]. These treatments can modulate structural and mechanical properties of the callus [38]. Due to the strict inclusion criteria, many studies were excluded during the study selection process. These were also studies who assessed bone healing by performing CT, mechanical and histological testing. However, in these studies, the different methods were used complementary to each other and the results of these methods were not compared to each other. Therefore, it is not possible to draw conclusions from these studies concerning the best CT-outcome parameter. Also, minimal follow-up time was set to 4 weeks, as we were not interested in studies who only looked at early stages of fracture healing. As fracture healing progresses differently between animal species and depends on fracture size, one could argue if this period was accurate. Also, the associations between CT parameters and mechanical and histological outcomes might be influenced by the stage of fracture healing, which may vary between the studies. Lastly, the risk of bias of studies was assessed with the QUADAS-2 tool. As this tool is designed for clinical studies, it may not be accurate for pre-clinical studies. However, no pre-clinical risk of bias tool exists for diagnostic studies. Most studies in this review showed concerns about risk of bias. To decrease risk of bias in future studies, we strongly recommend to interpret the index test (CT), without knowing the results of the reference test (mechanical or histological testing) and to describe this process in the paper.

Based on the currently available literature, density-related parameters seem to be most promising parameters to assess bone union after a fracture. Especially, CT-assessed torsional rigidity is a promising parameter to assess bone union. To improve the clinical assessment of fracture healing, we encourage the conduction of more high-quality clinical studies investigating the applicability of CT-assessed torsional rigidity for bone union assessment. In the future, torsional rigidity could potentially become a widely accepted outcome measure for bone union assessment in clinical studies and in clinical practice.