This section demonstrates integral procedures for processing and analyzing in situ and 3D crystallographic datasets. A typical workflow of the function package is illustrated in Fig. 2, organized in a set of modules that group the algorithms according to their function. In time-dependent studies (like ours), there is likely a misalignment of the scanned domains and/or grain orientations between consecutive time-steps, causing challenges in data analysis downstream. Thus, after importing the HDF5 data, our workflow starts with the alignment of volume data via genetic optimization to define the common scope (intersection volume) for further analysis. Within the defined scope, grain cleanup procedures filter unreliable features such as incorrectly indexed voxels that inevitably appear during data collection. The grains are processed according to three thresholds: angular, volumetric, and completeness. The cleaned data can be visualized in 3D or cross-sectionally in 2D with different color schemes depending on user preference. Data analysis functions provide various capabilities for the statistical analysis of the entire polycrystalline aggregate or a single grain in particular. Additionally, for time-resolved data, our toolbox offers a means of grain tracking via combinatorial optimization. We step through each of these modules using the two LabDCT reconstructions as a test case.
Data Processing
Volume Alignment
A geometric deviation of sample, including translation and rotation, would be found between two time-steps (denoted \(t_{1}\) and \(t_{2}\) hereafter) if the sample was to be repeatedly removed and mounted on its holder. Translation will introduce a spatial drift in the corresponding grains between the two datasets. Meanwhile, rotation will alter the perceived crystallographic orientation of the grains between time-steps \(t_{1}\) and \(t_{2}\). Both transformations will thereby mislead further analysis like grain tracking. Thus, body alignment (registration) of sample volume is the first step of our pipeline.
We solve the registration problem by minimizing the misfit of volume through genetic algorithm (GA). Volume is defined as the set of all pixels within the 3D sample; misfit is the number of voxels not shared between the two volumes \(t_{1}\) and \(t_{2}\), over total number of voxels. Six independent parameters are determined during the alignment procedure: three translational vectors \(\left( {T_{x} , T_{y} , T_{z} } \right)\) and three rotational angles \(\left( {R_{x} , R_{y} , R_{z} } \right)\), assuming a purely isometric transformation. This implies a massive calculation over a 6D space to calculate and compare misfit, if done for all possible transformations. Instead, we achieve a better alignment in a much shorter amount of compute time via GA. The detailed operating principle of GA is discussed below. After those six parameters are optimized by GA, a transformation matrix is generated for the alignment of \(t_{2}\) onto \(t_{1}\), where \(t_{1}\) is the reference state. The application of the obtained transformation matrix on scalar data is illustrated in Fig. 3a, where volumes \(t_{1}\) and \(t_{2}\) are presented by blue and red colors, respectively. Before alignment (top left), a huge misfit is observed. After alignment (top center), not only the outermost contour of sample volume, but also the orientation and location of a pore inside the volume (see arrow) align closely. Subsequently, the shared (intersection) volume is determined (top right), which serves as a “mask” to ensure that all the following comparisons between \(t_{1}\) and \(t_{2}\) are carried out under the same region-of-interest.
Based on the Euler rotation matrix described by three rotational angles \(\left( {R_{x} , R_{y} , R_{z} } \right)\), the crystallographic orientation of grains is also updated. Figure 3b shows the registration of grain orientations. Color represents crystallographic orientation parallel to the specimen height (\(z\)-direction). It can be found that crystallographic orientation of grains becomes similar after updating their Rodrigues vectors. For example, the grains indicated by white arrows are a pair of matching grains. The orientation of those two matched grains is presented by cyan and green colors before updating orientations, respectively. Once the Rodrigues vectors are updated, the grain color at \(t_{2}\) becomes cyan (bottom right), which corresponds closely to the grain at \(t_{1}\). To quantify the accuracy of the orientation alignment, we calculated the change of average misorientation of the matching grains between time-steps. Before the orientation update, average misorientation angle of the matching grains is 7.56 ± 0.13°; after the update, it decreases to 2.63 ± 0.59°.
Since volume alignment is only evaluated by the degree of misfit, it can be formulated as an optimization problem. GA has a few advantages over other optimization engines. For instance, it is generally effective in optimizing a function with many local minima since it does not require a good starting estimate; furthermore, it is quite flexible in that it places no constraint on the form of the objective function [23, 24]. Due to these merits, GA has been employed to register 2D and 3D data [24,25,26]. In this work, GA function from Global Optimization Toolbox of MATLAB is employed [27]. Drawing from Darwin’s theory of natural selection, GA begins with a randomly generated set of individuals (rigid transformations in our case) also known as a population at first generation (see Fig. 4). Parallel computation is utilized to calculate the fitness (here, misfit) between transformed volume at \(t_{2}\) and volume at \(t_{1}\). Individuals with relatively lower misfit would then be selected from the current population and its genome modified and recombined to produce the next generation. We set the maximum number of generations to 25, although this may require some tuning based on the size of the search space. Once the lowest misfit computed lies below the user-specified misfit tolerance during the 25 generations, the algorithm is interrupted in order to output the corresponding six parameters. If the calculated misfit never goes under the threshold, another iteration of GA with 25 generations containing twice the population size is triggered. Larger population size gives more opportunity to reach global minima rather than local minima. Finally, GA outputs the six parameters corresponding to the lowest misfit. As GA is “embarrassingly parallel” and converges to a high-quality solution after only a few generations through a number of bio-inspired operators, the computation time is low and accuracy of alignment is quite high. In practice, parent selection is done by stochastic universal sampling; mutation via Gaussian distribution; and crossover through scattered blending. Further details regarding GA can be found in Ref. [27]. In contrast, it is difficult to optimize the accuracy of “brute force” calculations due to the limitation of finite search step size. The accuracy and efficiency were compared on a workstation with Intel(R) Xeon(R) E-2176 M CPU core and 64 GB RAM capacity. Between volumes \(t_{1}\) and \(t_{2}\), the angular constraints were set to ± 7° for each of the three rotational angles \(\left( {R_{x} , R_{y} , R_{z} } \right)\) and misfit tolerance of 2%, for both GA and “brute force” comparison approach. The former took 657 s for body alignment with 2.42% of misfit, while the latter took 1175 s with 2.61% misfit. The result indicates that GA can significantly improve the automated registration of 3DXRD data, achieving a better alignment in a much shorter amount of time.
Grain Cleanup
The cleanup module aims to align the data based on outputted transformation matrices (from above) and further process them to exclude unreliable features within the intersection volume. Upon importing the raw (i.e., as-collected) 3DXRT data together with alignment matrices and mask array, data outside of masked scope are cropped, and every dimension of data except orientation is transformed to the new frame-of-reference. Since the calculation of crystal orientation is computationally costly—there are \({\text{O}}\left( {N^{3} } \right)\) rotations that need to be performed, assuming a mask dimension of \(N\)—the Rodrigues vectors are updated only after the average orientation of grain is computed based on clustering voxels with similar orientation. That is, based on a user-defined angular threshold (typically ≤ 1°), grains with very small misorientation angles are grouped into a single grain. This order of operations greatly reduces computation load without compromising the accuracy of orientational alignments.
After clustering grains and updating the average grain orientations, data are further processed to remove small grains. Any grain composed of fewer voxels than a preset volume threshold is considered as noise and treated as unindexed regions. The rationale behind this procedure is to retain statistically significant grains and not to artificially inflate grain statistics. The volume threshold is determined based on the spatial resolution of the reconstruction data (10 μm for LabDCT). Finally, grains with lower average completeness than a preset completeness threshold are considered as unreliable data and marked as unindexed. Low-completeness grains are often located near the edges of the LabDCT aperture, wherein grains may lie partially outside of the illuminated field-of-view. Consequently, their diffraction patterns are partially occluded by the aperture, resulting in a low reconstruction completeness in forward modeling simulations.
Outputs of this module include basic measurements of the processed grains: Grain volume is expressed as total number of voxels; grain orientation as the average Rodrigues vector over all voxels in each grain; grain position as its center-of-volume, considering the Euclidean coordinates of every voxel in the grain. Grain adjacency is also stored in a form of a \(M\) by 2 arrays of neighboring pairs that meet at a grain boundary, where \(M\) is the number of unique pairs. Those grains adjacent to the free surfaces of the sample are designated as “exterior” grains and those in the bulk as “interior.”
Visualization
The segmented grain surfaces are meshed or represented as a series of triangles and vertices. Triangulation is accomplished via MATLAB’s built-in Marching Cubes routine. To eliminate any “staircasing” artifacts that occur as a result of the triangulation, we smooth the mesh to better reflect the physical grain shape. In particular, we make use of Laplacian smoothing, which utilizes the normalized curvature operator as weights for smoothing in a direction normal to the mesh interface. In practice, we apply only a few iterations of mesh smoothing in order to reduce artifacts while preserving the integrity of the interface.
Different modes for mesh coloring are available based on user preference. For instance, the grains can be colored according to their crystallographic orientation, topology (i.e., number of grain neighbors), volume, and average completeness. Figure 5 illustrates these different representations of the \(t_{1}\) volume. It should be noted that average completeness value of many grains is close to 0.45 because reconstruction of this particular dataset was executed with a tolerance level of 0.45, meaning that indexing voxels concluded once a completeness value of 0.45 was achieved. Grains located on the topmost surface of the sample show a lower completeness compared to ones located below because those grains partially lie out of the illuminated field-of-view. The bottom surface of the sample is not shown because it is outside the intersection mask between volumes \(t_{1}\) and \(t_{2}\) (see also Fig. 3a). Visualization of individual 2D slices along the specimen z-direction is also available under the same color schemes, thereby demonstrating the versatility of our function package in handling different data shapes.
Data Analysis
Simple Metrics
Direct imaging of 3D microstructure allows for the characterization of various indicators of microstructure evolution, including grain size, shape, and topology. These metrics can only be estimated via quantitative stereology of planar sections [28]. To our benefit, these parameters can be measured directly from 3DXRD without any averaging or interpolation. In the analysis module, we provide some basic statistics at the grain level. These include
Grain volume, see, e.g., Fig. 6a corresponding to \(t_{1}\) volume. A wide range of grain sizes is captured in the grain size distribution, from 33 voxels (4.3 × 103 μm3) to 20,298 voxels (2.5 × 106).
Grain topology, Fig. 6b. An accurate assessment of topology is limited by the finite sample size [29], meaning that the number of neighbors for the “exterior” grains may be underestimated compared to those in the specimen “interior.” To resolve this potential bias, we distinguish between topologies of “interior” versus “exterior” grains.
Grain morphology, Fig. 6c. Sphericity (\(\varPsi\)) is defined as a ratio of surface area of a sphere having the same volume of a grain to the actual surface area of a grain; that is, \(\varPsi = \frac{{\pi ^{{\frac{1}{3}}} \left( {6V_{g} } \right)^{{\frac{2}{3}}} }}{{A_{g} }},\) where \(V_{g}\) is volume of a grain and \(A_{g}\) is surface area of a grain. The former is outputted from above, while the latter is determined as the summation of each triangle area adorning the grain
surfaces, \(A_{g} = \mathop \sum \nolimits_{i = 1}^{F} A_{tri}^{i}\), where \(A_{tri}^{i}\) is the area of triangle \(i\) and \(F\) is the total number of triangle faces. The area of each triangle is computed as \(A_{{tri}}^{i} = \frac{1}{2}\left\| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{e} _{12}^{i} \times \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{e} _{13}^{i}} \right\|\), where \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{e}_{jk}^{i}\) is the edge vector from vertex \(j\) to \(k\) of triangle \(i\). The vast majority of grains at the time-step \(t_{1}\) show a relatively high compactness (\(\varPsi \to 1\)), which is expected for a recrystallized system.
Grain misorientation, Fig. 6d. Misorientation \(\Delta g\) is formally defined as \(\Delta g = g_{i} g_{j}^{T}\), where \(g_{i}\) and \(g_{j}\) are the grain average orientations (in Rodrigues vectors) determined from the cleanup module above. The histogram weights in the misorientation distribution are the grain boundary areas, found by summing over all triangle areas along the boundary. We show for comparison the distribution expected for a material with uniformly distributed misorientations. The results for the \(t_{1}\) data indicate a near-random distribution of grain boundaries.
Grain texture, Fig. 6e. Shown is the inverse pole figure (IPF) of all grains in the \(t_{1}\) volume. It can be seen that the sample has no obvious texture at this particular time-step.
Multimodal Analysis
The integration of multiple imaging modalities enables us to investigate correlations between various features, thereby providing an in-depth understanding of the underlying microstructure. For instance, in the multimodal analysis module, we correlate the positions of grain boundaries (retrieved via LabDCT) to that of secondary features (observed via ACT). The ACT data are assumed to be registered to the LabDCT data through application of the functions in the alignment module. The secondary features are (in our case) micrometer-scale θ-Al2Cu particles whose locations in the microstructure are given in the form of centroid coordinates. The user may specify a grain-of-interest (GOI) and a distance threshold to then determine which particles in the particle cloud are adjacent to the GOI, and the corresponding grain-to-particle distances. We visualize the particles within a two-voxel distance threshold in Fig. 7a. To link grain boundaries to secondary features, we have developed a new algorithm, summarized here as follows: (1) For each triangle face along the grain boundaries, we calculate its centroid; next (2) we find the nearest-neighbor distances between the face centroid and particle locations; and (3) if this distance is less than the threshold, we conclude that the particle lies on or sufficiently close to the triangle face.
Step (2) above can be accomplished by calculating the Euclidean distance between each particle and each mesh triangle and then organizing the results in the ascending order of distance. This approach would necessitate \(N \times M\) calculations to correlate particle and grain boundary positions, where \(N\) is the number of particles and \(M\) the number of mesh triangles that enclose a given grain. Considering that \(M\) is on the order of 105 and \(N\) is also 105 (this work), the task of particle classification (as near or far from the boundary) is computationally intensive if done in such an exhaustive manner. To recognize patterns in the locations of particles with respect to grain boundaries, we harness the \(k\)-nearest neighbors (\(k\)-NN) algorithm, a type of “lazy” learning. \(k\)-NN lessens the computational load significantly—determining the nearest-neighbor particles in seconds—by using a so-called \(K\) d-tree to narrow the search space. This algorithm was previously implemented in measuring the local velocities of solid–liquid interfaces in dynamic, synchrotron-based CT experiments [30].
Provided that the grain misorientations are known (Fig. 7b), we can measure the particle-associated misorientation distribution (PMDF), among other interrelationships [31]. The PMDF is defined as the fraction of secondary features (here, particles) that are located on or in the vicinity of grain boundaries within a specific range of misorientation angles. It can be seen in Fig. 7c that the distribution of particles does not follow the distribution of grain boundaries, which might be expected if the particle–boundary correlations were truly random (i.e., density of particles per unit area of boundary is constant).
Grain Tracking
In this module, we define a mapping between experimental time-steps, allowing for the analysis of individual grains as time progresses. We use two key parameters for grain tracking: crystallographic misorientation and physical distance. The crystallographic orientation of a given grain should not change over time, provided the sample is fully recrystallized, and its location within the microstructure should also not change too drastically. Figure 8 illustrates our approach for grain tracking. Under the above two constraints, we search for a matching grain at some time-step in the future (\(t + \Delta t\)) within a local neighborhood of the grain at the current time-step \(t\). The neighborhood is defined as the smallest cuboid that encapsulates a grain with a user-defined additional padding (in number of voxels). Any grain included in the cuboidal scope is considered to be within a local neighborhood. For instance, the gray-colored region in Fig. 8 illustrates the cuboidal scope of grain \(m\) with default padding of two voxels. Care must be taken in defining the size of the grain neighborhood since too large a padding may lead to an incorrect grain assignment and too small an extension may fail to contain the matching grain. Any grain that is partially contained in this cuboidal scope at time-step \(t + \Delta t\) is labeled as a candidate grain.
For each candidate grain in the neighborhood, we tabulate its distance and misorientation. Distance \(\Delta d_{nm}\) refers to that between the centroid of grain \(m\) and centroid of candidate grain \(n\). The maximum (threshold) distance is predetermined as the half-diagonal length of cuboidal neighborhood. Similarly, misorientation angle \(\Delta \theta_{nm}\) is that between grain \(m\) at time-step \(t\) and grain \(n\) at time-step \(t + \Delta t\). The maximum allowable misorientation (threshold) is a user-defined value. In theory, the misorientation between two datasets should be zero if the sample is perfectly registered and there are no grain rotations. Yet this is often not the case due to slight misalignments between datasets (see “Data Processing” section). These two metrics are combined linearly to formulate a cost function \(J_{nm}\) associated with the assignment of grain \(n\) to grain \(m\),
$$J_{nm} = c \Delta d_{nm} + \left( {1 - c} \right)\Delta \theta_{nm} ,$$
(1)
where \(c\) is a scalar quantity (ranging from zero to one) that reflects the importance of the distance over misorientation criterion. Rohrer uses a similar formulation of the cost function [32]. The problem of grain tracking is then to find the lowest cost way of assigning grains from one time-step to the next. To solve this assignment problem in polynomial time, we employ the Hungarian algorithm (otherwise known as the Kuhn–Munkres algorithm) [33, 34]. The algorithm operates on a cost matrix \(\varvec{J} = \left\{ {J_{nm} } \right\}_{N \times M}\) and outputs a binary matrix \(\varvec{X} = \left\{ {x_{\text{nm}} } \right\}_{{{\text{N}} \times {\text{M}}}}\), where \(x_{nm} = 1\) if and only if the \(n\)th grain at \(t\) is assigned to the \(m\)th grain at \(t + {{\Delta }}t\). The total cost is then found as \(\mathop \sum \nolimits_{i = 1}^{N} \mathop \sum \nolimits_{j = 1}^{M} x_{\text{nm}} J_{nm} \to { \hbox{min} }.\) Unlike the typical assignment problem with a square cost matrix (i.e., the matrix dimensions are such that \(N = M\)), our cost matrix is rectangular (\(N < M\)) since the total number of grains decreases with time over the course of grain growth. However, the algorithm can be extended to rectangular arrays using the method prescribed by Ref. [34], which we have applied here. Worth mentioning is that the cost element \(J_{nm}\) for a non-candidate grain is computed to infinity, preventing the assignment of disappearing grains. Elements for candidate grains are normalized by threshold values to bring orientation and distance parameters into the same scale.
The result of grain tracking via Hungarian algorithm approach is evaluated using two performance metrics, matching efficiency and computation time. Matching efficiency represents the percentage of grains that are successfully tracked (assigned), i.e.,
$${\text{matching}}\;{\text{efficiency}}\;(\% ) = \frac{{{\text{number}}\;{\text{of}}\;{\text{matched}}\;{\text{grains}}\;{\text{at}}\;{\text{later}}\;{\text{time}}\;{\text{step}}}}{{{\text{number}}\;{\text{of}}\;{\text{total}}\;{\text{grains}}\;{\text{at}}\;{\text{later}}\;{\text{time}}\;{\text{step}}}}\times 100$$
(2)
Grain tracking between the two LabDCT datasets achieved a ~ 86% matching efficiency. The remaining ~ 14% of grains that were not assigned can be mainly attributed to grains that emerged into the tomographic field-of-view. Since the rod specimen is long enough to
be
considered an open system, “new” grains that were not captured in previous time-step may be detected near the top and bottom of the X-ray source aperture. This can be confirmed from Fig. 3b that top layer of \(t_{2}\) volume has several new grains that are not observed in previous \(t_{1}\) volume.
Hungarian optimization offers distinct advantages over the “brute force” solution to the assignment problem. The latter considers every possible assignment, implying a complexity of \({\text{O}}\left( {N!} \right)\). To speed up the task at hand, we may elect to iteratively (i) locate a grain neighborhood, (ii) compute costs \(J_{nm}\) of all grains in the neighborhood and (iii) assign matching grains based on minimum cost; once a matching grain is found, we proceed to the next grain in the dataset. However, rather than computing cost matrices \(\varvec{J}\) and analyzing the matching problem in a comprehensive manner, this approach sequentially assigns matching grain as it goes through the N grains, causing an inherent bias from the matching order. It is for this reason that the Hungarian algorithm offers a higher matching accuracy and computational efficiency over these brute force methods. On a same workstation with Intel(R) Xeon(R) E-2176 M CPU core and 64 GB RAM, grain tracking between datasets with \(M = 308\) and \(N = 295\) grains via Hungarian optimization takes 11.44 s with a cuboidal scope padding of two voxels, misorientation threshold of four degrees, and weight factor c of zero. On the other hand, the three-step iterative matching scheme described above takes 15.69 s with the same parameters and offers a matching rate of ~ 85%. Even though the matching efficiency of both methods is comparable, tracking by brute force results in a few cases of incorrect assignments for the reasons mentioned above. Even larger data sizes (e.g., fine-grained materials) should widen the performance gap between combinatorial optimization and “brute force” approaches.