1 Introduction

X-ray imaging allows non-destructive analysis of industrial manufactured objects, revealing their internal structure and composition, through dedicated X-ray Computed Tomography (X-CT) reconstruction. Despite the high interpretability of X-CT data, its computational demands and operational costs pose challenges to integration of conventional X-CT-based inspection into streamlined manufacturing processes. Conversely, X-ray radiograph analysis proves more operationally efficient in industrial setups [1].

A notable methodology for defect detection involves comparing measured X-ray radiographs of the object under scrutiny with simulated radiographs derived from a digital object, typically a computer-aided design (CAD) model [1, 2]. This, however, requires simulation of realistic X-ray projections of CAD models with a mesh projector, accounting for the multi-chromatic behavior of both the scanning system and the scanned objects. For effective projection-based analysis, knowledge of the object properties and spectral information of the scanning system are required. The object properties entail the shape and pose of the mesh, as well as its X-ray linear attenuation coefficient(s). Scanning system spectral information pertains to the X-ray source spectrum as well as the sensitivity of energy-integrating detectors. Unfortunately, these pieces of information are often not readily available to end-users, necessitating solving ill-posed estimations of the spectral behavior of the scanning system [3,4,5] or reliance on external optical systems [6], object-specific deep learning methods [7,8,9] or prior X-CT for mesh pose inference [10, 11].

Utilizing the recently developed CAD-ASTRA toolbox, which was designed for efficient X-ray mesh projection [12], we now demonstrate the its value in performing multi-object pose estimation with a minimal number of radiographs. A key advantage of CAD-ASTRA is its ability to simulate projections from complex arbitrary geometries. This capability is particularly useful in industrial environments where constraints on the placement of X-ray sources and detectors exist. Moreover, one of the projectors in CAD-ASTRA is implemented as a differentiable program, allowing it to be integrated into analytical gradient-based optimisers and auto-grad routines [13, 14]. By selecting a cost function that quantifies the dissimilarity between the measured data and the simulated data, adjustments to the mesh vertices or linear attenuation of the objects can be made to minimize the cost function. In this work, we leverage CAD-ASTRA and PyTorch to register multi object meshes using only a few X-ray projections, thereby determining their 3D position and orientation while simultaneously estimating the spectral characteristics of X-ray emission, transmission, and detection.

To show the versatility of the technique, we conducted the following experiments:

  • Registering multi object surface meshes with limited system spectral information and rudimentary initial pose of the individual meshes.

  • Investigation of the performance of surface mesh pose estimation by studying its dependency on the number of X-ray projections and the angular range within which the projections are acquired.

In this way, we show that a multi-object pose estimation can be achieved without the need of an external optical system, training deep learning models, or mesh extraction from prior X-CT, allowing seamless integration with a static multi-view radiographic setup.

2 Related Works

The pose estimation of objects within industrial settings traditionally relies on a comprehensive set of projections, through X-CT reconstructions. This technique allows for the inference of pose through the registration of CAD models with point clouds [15] or extracted meshes [2, 11]. However, the feasibility and desirability of 3D X-CT images may be limited in certain scenarios. In recent developments, efforts have been directed towards achieving pose estimation based on a low number of X-ray projections. A notable industrial approach utilizing deep learning was proposed by Presenti et al. [9], demonstrating pose estimation efficacy with as few as one projection. While such approaches exhibit promising performance in controlled environments, challenges arise from the specialized training procedures and the inherent black-box characteristics of many deep learning methods, impeding widespread adoption. Another recent contribution explored pose estimation from X-ray projections by employing a CAD model and matching 2D–3D image features through mesh projections [1]. Our approach shares similarities with this method, with the additional benefit of overcoming challenges associated with complex geometries and overlapping meshes, where image features might be hardly discernible. In this study, we show a multi-mesh registration from X-ray projections using a mesh projector implemented as a differential program, elucidating the capabilities of the proposed method.

3 Methods

CAD-ASTRA utilizes watertight triangular surface meshesFootnote 1 to represent homogeneous volumes as enclosed entities. These meshes incorporate information about the source and detector positions and orientations to simulate X-ray acquisition, considering object attenuation. The initial guess for mesh position and orientation, represented by \({\textbf{p}}\), serves as a starting point, with source and detector positions assumed from scan metadata.

Each mesh \({\textbf{m}} \in {\mathbb {R}}^{N \times 3}\) comprising N vertices undergoes transformation via a roto-translation operator \({\text {T}}_{{\textbf{p}}}: {\mathbb {R}}^{N \times 3} \rightarrow {\mathbb {R}}^{N \times 3}\). Here, \({\textbf{p}} = [\alpha , \beta , \gamma , t_x, t_y, t_z]\), defining the 3D mesh’s position and orientation, in terms of Euler Z–Y–Z angles (\(\alpha \), \(\beta \), \(\gamma \), respectively) and translation along the x, y, and z axis (\(t_x\), \(t_y\), \(t_z\)), respectively.

The projection operator \({\text {Proj}}_d\) maps mesh vertices to W projection images, each image consisting of M pixels:

$$\begin{aligned} {\text {Proj}}_d: {\mathbb {R}}^{N \times 3} \rightarrow {\mathbb {R}}^{M \times W }, \end{aligned}$$
(1)

with d representing metadata linked to the meshes.

Estimation of the pose \({\textbf{p}}_k\), for the k-th of the K meshes composing the scene, is achieved through linear programming, aiming to minimize the sum of squared differences in the projection space between measured projections \({\textbf{P}} \in {\mathbb {R}}^{M \times W }\) (adjusted for dark and bright fields) and projections simulated by \({\text {Proj}}_d\):

$$\begin{aligned} \mathop {\text {arg}\,\text {min}}\limits _{{\textbf{p}}_1, \cdots , {\textbf{p}}_K} \left\Vert {\textbf{P}} - \sum _{k=1}^K {\text {Proj}}_d \left[ {\text {T}}_{{\textbf{p}}_k} ({\textbf{m}}_k) \right] \right\Vert _2^2. \end{aligned}$$
(2)

Any objective function, as the one denoted in Eq. (2), can be conceptually decomposed as a concatenation of an error function g on the simulated projection function \({\text {Proj}}_d\), depending on a roto-translation by \({\text {T}}\). This leads to a compact formulation as \(f_d(x) = g({\text {Proj}}_d({\text {T}}(x)))\). Optimising the linear programming problem 2 through analytical gradient methods requires knowledge of all the Jacobians involved in the Jacobian of the composed function \(f_d\). The analytical gradient is then defined as:

$$\begin{aligned} \nabla f_d({\textbf{m}}) = {\text {Jac}}_{{\text {T}}}^T {\text {Jac}}_{{\text {Proj}}}^T \nabla g \left( {\text {Proj}}_d ({\textbf{m}}) \right) , \end{aligned}$$
(3)

where direct access to the Jacobian-vector product of \({\text {Jac}}_{Proj}\) is provided by CAD-ASTRA, whilst the other Jacobian-vector products are computed through auto-grad patterns [13, 14]. Optimising Eq. (2) with analytical gradient-based methods, offers computational efficiency, particularly in scenarios with detectors with high spatial-resolving capability.

As real X-ray projections in industrial scenarios are commonly employing a poly-chromatic X-ray source, accurate polychromatic forward model is needed to reduce the likelihood of undesired local minima in Eq. (2). Therefore, the \({\text {Proj}}_d\) operator is substituted by the poly-chromatic operator \({\text {PolyProj}}_d\), where:

$$\begin{aligned} {\text {PolyProj}}_d = \sum _{e=1}^E s_e D_e \delta _e \exp (-\sum _{k=1}^K \mu _{e,k} l_k). \end{aligned}$$
(4)

In the above formula, the X-ray attenuation of photons is based on the energy-dependent intensity of photons \(s_e\) emitted by the X-ray source, the detector response \(D_e\) and the energy \(\delta _e\) contained in the energy bin e. The product \(\Omega _e = s_e D_e \delta _e\) defines the weight of each energy bin e, contributing to the total spectral behavior \(\varvec{\Omega } = [\Omega _1,\ldots , \Omega _E]\), with E denoting the number of energy bins. For each mesh, the attenuation is measured by its spectral linear attenuation \(\mu _{e,k}\) and the path-length \(l_k\) crossed by a geometrical ray pointing at a detector pixel.

If such a model is implemented in frameworks that leverage automatic differentiation patterns, such as PyTorch, it is possible to optimise even the new linear programming problem using Eq. (4) without expensive numerical approximations. As the poly-chromatic characteristics of the X-ray source and detector, as well as the spectral linear attenuation of each scanned material, may be unknown, these parameters can be jointly optimized with a proper cost function. For experiments with no prior knowledge on the exact scanning system spectral characteristics, the linear programming problem in Eq. (2) is enriched with regularization on the first derivative of the system spectral behavior, under the assumption of smooth spectral behavior:

$$\begin{aligned} \begin{aligned} \mathop {\text {arg}\,\text {min}}\limits _{{\textbf{p}}_1, \ldots , {\textbf{p}}_K, \varvec{\Omega }} \left\Vert {\textbf{P}} - \sum _{k=1}^K {\text {PolyProj}}_d \left[ {\text {T}}_{{\textbf{p}}_k} ({\textbf{m}}_k) \right] \right\Vert _2^2 + \left\Vert \nabla _e \varvec{\Omega }\right\Vert _2^2. \end{aligned} \end{aligned}$$
(5)

Pose refinement through re-iteration: To mitigate cases where local minima occur due to symmetry in the object with respects to the vertical axis, an additional step of re-iteration may be introduced (with results presented separately). After the initial registration, the algorithm systematically rotates the objects around their symmetry axis and re-executes the registration procedure. This process helps in overcoming challenges posed by symmetry, enhancing the robustness of the pose estimation, especially in scenarios with highly symmetric objects.

4 Experiments

In this section, we present the experiments conducted on three distinct scanned objects using one or more supporting scanning elements. The scanned objects include an aluminum step-wedge, and two additive manufacturing (AM) printed samples, which are a stainless steel 316L (SS316L) cantilever and a SS316L cylinder-like object. The supporting elements, constructed from polyamide (PA12), consist of cylinders with a height of 1 cm and diameters ranging from 3 to 5 cm, as well as a hollow shaft with a 0.5 cm diameter. The FleXCT scanning system [16] was utilised, with different source-to-object and source-to-detector distances, kVp and pre-filtering settings for each scan.

The objective of these experiments was to showcase two different applications of the registration technique, one involving limited information about the scanning system and the other utilizing limited projection information.

4.1 Registration with Limited Scanning System Information

In this experiment, limited scanning system information refers to uncertainty regarding the spectral behavior of the scanning system and the objects’ poses. The registration process addresses a challenging scenario by iteratively estimating the spectral behavior and poses through the solution of the linear programming problem in Eq. (5). The initial pose of the objects and supporting elements is set as a shifted and rotated configuration from a vertically aligned state. The registration is performed using 100 projections acquired in a circular trajectory around the object.

Fig. 1
figure 1

Graphical rendering of the X-ray setup of one projection, showing the initial (left) and final (right) pose of the cantilever and its supporting element

Fig. 2
figure 2

Graphical rendering of the X-ray setup of one projection, showing the initial (left) and final (right) pose of the cylinder and its supporting element

4.2 Pose Estimation Performance by Reducing the Number of Projections

In this experiment, the linear programming problem from Eq. (2) based solely on projection error is employed. The initial pose of the scene’s objects is realistically estimated with the assumption that the objects and supporting elements are vertically aligned. The registration is conducted by reducing the number of projections from 100 to 10 (100, 50, 10), all acquired in a circular trajectory around the isocentre.

4.3 Pose Estimation Performance by Reducing the Angular Range

Similar to the preceding experiment, we employ the linear programming problem presented in Eq. (2), assuming that the objects are vertically aligned. In this experiment, only two projections are utilized, chosen from a complete circular scan around the isocenter. The angle between these two projections is systematically decreased (90\(^\circ \), 50\(^\circ \), 10\(^\circ \)) to assess its influence on pose estimation stability.

Fig. 3
figure 3

Graphical rendering of the X-ray setup of one projection, showing the initial (left) and final (right) pose of the stepwedge and its supporting elements

Fig. 4
figure 4

An X-ray projection residual showing the initial pose (top row) of the cantilever (left), the cylinder (center) and stepwedge (right) with their supporting element, and the final pose (bottom row) of the objects after the registration procedure

Fig. 5
figure 5

A X-ray projection residual showing the initial pose of the cantilever and its supporting element aligned along the vertical axis, and the final pose of the objects after registration using 20 projections, 10 and 5

Fig. 6
figure 6

A X-ray projection residual showing the initial pose of the cylinder and its supporting element aligned along the vertical axis, and the final pose of the objects after registration using 20 projections (a), 10 (b) and 5 (c). Systematic rotation of the objects’ poses around their vertical axes allows for mitigating the risk of local minima, resulting in more favorable registration outcomes with 20 projections (d), 10 (e), and 5 (f) compared to the configurations in (ac)

To ensure the accuracy and repeatability of this experiment, the mesh registration is iterated five times, initiating from different projections for each scene and for each angular case. This repetition aids in investigating the consistency and reliability of the obtained results.

5 Results and Discussions

In this section, the outcomes of the experiments described in Sect. 4 are shown through 3D rendering of the scene of the initial and final objects spatial configuration (through Mayavi libraries [17]) and figures of the residual errors (i.e. difference between projections and simulated projections). For the latter, the projection-wise RMSE (root mean squared error) is computed and shown to ease the comparisons. Animations showing the evolution of the residual errors throughout the optimisation steps are available online as supporting media (https://osf.io/da6p3/).

Table 1 Pose estimation timings for each scene, varying the number of projections
Table 2 Pose estimation results for the scene with the stepwedge, presented as the rotation angle of the stepwedge relative to an estimated rotation axis
Fig. 7
figure 7

A X-ray projection residual showing the initial pose of the stepwedge and its supporting elements aligned along the vertical axis, and the final pose of the objects after registration using 20 projections, 10 and 5

5.1 Registration with Limited Scanning System Information

A scaled 3D rendering, showing both the initial and final poses of the objects from Experiment 4.1, is presented in Figs. 1, 2, and 3. These figures include the X-ray source and detector, forming a digital twin of the actual scanning setup.

To assess the accuracy of the registration results, attention is directed to the residual images (Fig. 4). These images display the residuals for one of the 100 projections utilized in this experiment. During the registration procedure, the position of each mesh in space is adjusted to minimize the residual. Discrepancies that persist between the real and simulated projections may stem from object deformations, especially noticeable in the case of AM printed samples, uncertainties in the actual chemical composition of samples, and additional physical effects not simulated, such as X-ray scattering.

5.2 Pose Estimation Performance by Reducing the Number of Projections

The results of the registrations are depicted in Figs. 5, 6, and 7, showcasing initial residual errors and final errors (a–c) for three scanning scenarios with 20, 10, and 5 projections. Timings for these registrations are presented in Table 1.

For the cantilever, reducing the number of projections does not seem to limit the quality of the registration. However, for the cylinder scene, a mismatch in the identification of teeth in the bottom part of the cylinder indicates convergence to a minimum different from the one identified in the previous experiment (Fig. 4). To address this, a few more iterations of the registration algorithm, incorporating a starting rotational offset against the vertical axis, successfully mitigate issues arising from the high symmetry of the cylinder. This refinement leads to more accurate results, as depicted in Fig. 6d–f.

In contrast, the registration of the stepwedge exhibits stable behavior, except for the most challenging case with only 5 projections. In this instance, the supporting straw-like object demonstrates a different pose convergence. This behavior is attributed to the near transparency of the thin plastic straw to the majority of X-ray photons produced by a 230 kVp X-ray source. The attenuation values are comparable to flat-field fluctuations observed during the scans, contributing to the pose convergence variation.

To inspect the stability of the estimated pose as a function of the number of projections, a further analysis is conducted by repeating this last scenario of the stepwedge 5 times, starting from different sets of projections. The results are summarized in Table 2, which shows the angle of rotation against the estimated rotational axis in relation to the number of projections. The results consistently indicate that lowering the number of projections increases the likelihood of ending up in undesired minima for the estimated pose parameters.

Fig. 8
figure 8

A X-ray projection residual showing the initial pose of the cantilever and its supporting element aligned along the vertical axis (left side). The final poses of the objects are presented after executing the registration procedure using 2 projections with angular ranges of 90 deg (a), 50 deg (b), and 10 deg (c). Systematic rotation of the objects’ poses around their vertical axes allows for mitigating the risk of local minima, resulting in more favorable registration outcomes with different angular ranges 90 deg (d), 50 deg (e), and 10 deg (f) compared to the configurations in (ac)

Fig. 9
figure 9

A X-ray projection residual showing the initial pose of the cylinder and its supporting element aligned along the vertical axis (left side). The final poses of the objects are presented after re-executing the registration procedure using 2 projections with angular ranges of 90 deg (a), 50 deg (b), and 10 deg (c). Systematic rotation of the objects’ poses around their vertical axes allows for mitigating the risk of local minima, which resulted in different outcomes for the angular ranges 90 deg (d), 50 deg (e) and 10 deg (f)

Fig. 10
figure 10

A X-ray projection residual showing the initial pose of the stepwedge and its supporting elements aligned along the vertical axis (a). The final poses of the objects are presented after re-executing the registration procedure using 2 projections with angular ranges of 90 deg (b), 50 deg (c), and 10 deg (d). Systematic rotation of the objects’ poses around their vertical axes allows for mitigating the risk of local minima, resulting in more favorable registration outcomes with different angular ranges 90 deg (e), 50 deg (f), and 10 deg (g) compared to the configurations in (bd)

Table 3 Pose estimation timings for each scene, by using 2 projections

5.3 Pose Estimation Performance by Reducing the Angular Range

Similar to the preceding experiments, the results, presented as projection residuals, are depicted in Figs. 8, 9, and 10. Timings for these registrations are presented in Table 3. Given the significant limitation of projective information in this scenario, results are showcased for both the conventional application of the algorithm and a re-iteration of the registration procedure.

In the scanning scenarios involving the cylinder and stepwedge, the results exhibit consistent behavior with the findings of previous experiments. However, in the case of the cantilever, the more challenging registration scenario results in an unrealistic positioning of the main object, noticeable for the case with a angular distance of 50 deg. The difficulty arises from a more impervious solution space, making it easier to fall into local minima, as demonstrated in the more challenging case with 10\(^\circ \) in Fig. 8c. Again, re-iteration of the registration algorithm leads to more accurate pose estimation, as graphically shown in the bottom row of Fig. 8.

The accuracy and stability of the registration procedure, including the re-iteration procedure, for all scenes in this experiment are extensively reported in the supplementary material, in terms of the average and standard error of the estimated pose parameters for each object. In this analysis, the registration runs 5 times with different pairs of angles. The results indicate that the estimated position is relatively stable throughout the repetitions, and the error of the pose parameters is relatively low. The maximum deviations are recorded for the stepwedge, as its distance from the source (500 mm) is significantly higher than in the other two cases (cantilever 86.68 mm, cylinder 43.33 mm). The rotational angle shows a standard error of 1.76\(^{\circ }\) across the repetitions, while its translation is determined with an error of 2.41 mm. Exceptions are observed for supporting elements, as they appear in their CAD model as perfectly symmetrical around their vertical axis. As also the surface mesh of the cantilever is perfectly symmetric to one of its intersecting planes, the rotation-related parameters have higher error due to the ambiguity arising from of its symmetry.

6 Conclusion

In conclusion, our novel approach to 3D mesh registration in few-view industrial X-ray imaging, utilizing an X-ray mesh projector with compatibility for the ASTRA toolbox and auto-differentiation libraries like PyTorch, emerges as a resource-efficient alternative. The method, leveraging a differentiable X-ray mesh projector, has demonstrated efficacy in achieving 3D multi-mesh registration in multiple X-ray scanning scenarios.

The experiments revealed the robustness of our approach in simultaneous multiple object registration, even under poly-chromatic conditions with limited knowledge about the scanning system’s spectral characteristics or sparsity in projection domain. When using 2 projections, the highest error on the rotational angle was up to 1.76\(^\circ \) and 2.41 mm on the translation, for the case of the stepwedge with source-to-object distance of 500 mm. However, challenges surfaced, particularly when objects had a high degree of symmetry or in case projective model inaccuracies were present. Addressing these challenges is crucial for enhancing the applicability and accuracy of our proposed methodology.

Despite identified challenges, our method highlights resource efficiency, eliminating the need for resource-intensive X-CT reconstruction allowing registration even in a fixed multi-head X-ray radiography scanning system. This study marks a significant advancement, showcasing the practicality and efficiency of our proposed methodology. As a future prospect, the method’s adaptability opens possibility to proceed in mesh deformation estimation from X-ray projections. This potential extends the utility of our approach, making it a valuable candidate for enhancing industrial inspection workflows.