Topological Data Analysis for the Characterization of Atomic Scale Morphology from Atom Probe Tomography Images
Atom probe tomography (APT) represents a revolutionary characterization tool for materials that combine atomic imaging with a time-of-flight (TOF) mass spectrometer to provide direct space three-dimensional, atomic scale resolution images of materials with the chemical identities of hundreds of millions of atoms. It involves the controlled removal of atoms from a specimen’s surface by field evaporation and then sequentially analyzing them with a position sensitive detector and TOF mass spectrometer. A paradox in APT is that while on the one hand, it provides an unprecedented level of imaging resolution in three dimensions, it is very difficult to obtain an accurate perspective of morphology or shape outlined by atoms of similar chemistry and microstructure. The origins of this problem are numerous, including incomplete detection of atoms and the complexity of the evaporation fields of atoms at or near interfaces. Hence, unlike scattering techniques such as electron microscopy, interfaces appear diffused, not sharp. This, in turn, makes it challenging to visualize and quantitatively interpret the microstructure at the “meso” scale, where one is interested in the shape and form of the interfaces and their associated chemical gradients. It is here that the application of informatics at the nanoscale and statistical learning methods plays a critical role in both defining the level of uncertainty and helping to make quantitative, statistically objective interpretations where heuristics often dominate. In this chapter, we show how the tools of Topological Data Analysis provide a new and powerful tool in the field of nanoinformatics for materials characterization.
KeywordsAtom probe tomography Topological data analysis Persistent homology
The modern development of Atom Probe Tomography (APT) has opened new exciting opportunities for material design due to its ability to experimentally map atoms with chemistry in a 3D space [1, 2, 3, 4, 5, 6, 7]. However, the challenges exist to accurately reconstruct the 3D atomic structure and to more precisely identify features (for example, precipitates and interfaces) from the 3D data. Because data is in the format of discrete points in some metric space, i.e., a point cloud, many data mining algorithms which have been developed are applicable to extract the geometric information embedded in the data. Nevertheless, those geometric-based methods have certain limitations when being applied to solve the problems in atom probe data. We summarize below the limitations of geometric-based methods and present a data-driven approach to address significant challenges associated with massive point cloud data and data uncertainty at sub-nanoscales which can be generalized to many other applications.
7.1.1 Atom Probe Tomography Data and Analysis
Improvements in data collection rates, field-of-view, detection sensitivity (at least one atomic part per million), and specimen preparation have advanced the atom probe from a scientific curiosity to a state-of-the-art research instrument [9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. While APT is a powerful technique with the capacity to gather information containing hundreds of millions of atoms from a single specimen, the ability to effectively use this information has significant challenges. The main technological bottleneck lies in handling the extraordinarily large amounts of data in short periods of time (e.g., giga- and terabytes of data). The key to successful scientific applications of this technology in the future will require that handling, processing, and interpreting such data via informatics techniques be an integral part of the equipment and sample preparation aspects of APT.
As applies to APT, two main phases are involved in the data processing and analysis. The first one is the reconstruction of the 3D image, which identifies the 3D coordinate and chemistry for each collected atom. The second phase is to extract useful information from the reconstructed image; for example, to identify crystalline structures, clusters, and precipitates. There are two parameters of interest here which need to be determined during the 3D image reconstruction: the voxel size [19, 20, 21] and the elemental concentration threshold for the voxels. Normally these two parameters are determined empirically by trial and error—i.e., a value is set for the parameter and if the expected features are visible then the image is considered to be correct. Once the parameters are set, they are treated as fixed values and all the subsequent analyses are done based on these set values. There are two issues with this approach (1) the determination of the values for the parameters is largely subjective, and (2) once the values are chosen, the results of the subsequent analyses are biased toward those particular values.
7.1.2 Characteristics of Geometric-Based Data Analysis Methods
The modern development of (APT) has opened new exciting opportunities for material design due to its ability to experimentally map atoms with chemistry in a 3D space. However, the challenges exist to accurately reconstruct the 3D atomic structure and to more precisely identify features (for example, precipitates and interfaces) from the 3D data [23, 24, 25, 26, 27, 28, 29, 30]. Because data is in the format of discrete points in some metric space, i.e., a point cloud, many data mining algorithms, which have been developed, are applicable to extract the geometric information embedded in the data [31, 32, 33, 34]. Nevertheless, those geometric-based methods have certain limitations when being applied to solve the problems in atom probe data. We summarize below the limitations of geometric-based methods.
In the category of supervised learning , many methods require prior knowledge about the data. In the case when the prior knowledge is not available, assumptions need to be made and a bias could be introduced. For example, regression usually assumes a mathematical function between the variables, which means the conclusion we draw from the regression would bias the function that is chosen. On the other hand, for unsupervised learning methods , there is usually some parameter(s) that needs to be determined for the algorithm. For example, clustering methods usually require the number of clusters (or some equivalent parameter) to be manually determined; in the case of dimensionality reduction, a common assumption is that the data resides on a lower dimensional manifold, which will sufficiently represent the data, although the dimension of the manifold may not be something that can be determined by the algorithm.
Due to the wide range of applications, there is hardly a universal rule to determine the values of the parameters required by the geometric-based methods. For a particular task, the parameters can be determined either empirically based on the constraints of the situation at hand, or by some algorithm . In these cases, the hidden assumption is that the number of the parameter is fixed once chosen. In some scenario, it would be worthwhile to make those fixed parameters variables. This is not equivalent to giving a set of values to the parameters and collecting all of the results, since the results are independent from each other. What is needed is a scheme that can summarize the results as the parameter changes value. The lack of variability also exists on another level, that is, geometric-based approaches have the property of being exact, i.e., two points in a space are geometrically distinguishable as long as they do not share the same coordinates. As a result of this, for example, classification algorithms determine the classes by using a set of hyper boundaries which are fixed once obtained by training the algorithm.
topology focuses on the qualitative geometric features of the object, which themselves are not sensitive to coordinates, which means that the data can be studied without having to use some algebraic function, and thus no prior assumption or parameter needs to be dealt with;
instead of using a metric for distance, topology uses a less clear metric, i.e., “proximity” or “neighborhood”, since “proximity” is less absolute than the actual metric, topology is capable of dealing with the scenarios where information is less exact;
- (iii)the qualitative geometric features can be associated to some algebraic structure through homology, so changes in the topology can be tracked by these algebraic structures, which can be useful when assessing the impact of a parameter on the result of a given analysis. All these properties make the topological-based methods good candidates for dealing with APT data. Table 7.1 summarizes the main differences between the geometric- and topological-based methods.Table 7.1
Comparison of geometric-based and topological-based methods
Requirement for model/Assumption
Requirement for coordinate
Based on algebraic model or assumptions such as data has certain algebraic property
Need to have coordinate information since metric(s) is used
Parameter is fixed once the value is determined; result cannot reflect the impact of different parameter values
No model required, no algebraic assumption on the data
Use neighborhood (proximity) as metric, possible for coordinate-free applications
Parameter can be variable; result is integrated with the parameter of different values
7.2 Persistent Homology
Summary of homology classes and their corresponding qualitative geometric features for the first few dimensions
0th homology class
1st homology class
2nd homology class
3rd homology class
2D holes (enclosed area)
3D cavities (enclosed volume)
4D Hyper-voids (enclosed hyper-volume)
7.3 Voxel Size Determination: Identification of Interfaces
for a given voxel size, apply a Gaussian kernel to each atom at its exact position;
sum all of the Gaussian kernels to define a estimated density across the voxel;
define another Gaussian assuming every atom in the voxel is located within the center of the voxel;
calculate the difference between the estimated density and the central Gaussian which approximates the true density; and
define the optimal voxel size as that which has the minimum difference.
The specifics of each step are expanded below.
where hd > 0 is the window width, smoothing parameter or bandwidth.
7.4 Topological Analysis for Defining Morphology of Precipitates
Interfaces and precipitate regions are typically identified from APT data by representing them as isoconcentration surfaces at a particular concentration threshold, thereby making the choice of concentration threshold critical. The popular approach to selecting the appropriate concentration threshold is to draw a proximity histogram , which captures the average concentration gradient across the interface and visually identifies a concentration value that is the best representative of an interface or phase change occurrence. This makes the choice of concentration gradient user dependent and subjective. In this section, we will showcase how persistent homology can be applied to better recover the morphology of the precipitates.
As discussed earlier and expanded upon in our prior work , the persistence of different topological features can be recorded as barcodes, which we now group according to each Betti number. The horizontal axis represents the parameter ɛ or the range of connectivity among points in the point cloud while the vertical axis captures the number of topological components present in the point cloud at each interval of ɛ. There has to be some knowledge of the appropriate range for ɛ, such as the interatomic distance when dealing with raw atom probe data or voxel length if the data has been voxelized. The persistence of features is a measure of whether these features are actually present in the data or if they are artifacts appearing at certain intervals.
The top panel shows the evolution of Betti numbers for varying Sc concentration. At each value of Sc concentration threshold “δ”, those voxels having a concentration of δ ± 0.02 were chosen. Consider β0: at a high concentration threshold, beyond 0.5, a very small number of simply connected components are observed. This is because very few voxels have concentration value equal or more than this threshold. As concentration threshold is decreased, more voxels qualify to be included in the group leading to an increase in β0. The value of β0 remains constant for a certain range indicating that these are real features. A plot of the voxels at δ = 0.3 shows that it indeed captures real clusters of Sc. With further decrease of concentration threshold, a decrease in β0 is observed. This is because every voxel outside the Sc clusters has some minimal content of Sc and the inclusion of all exterior voxels results in one single connected component. We also observe a peak in the value of β1 at a low concentration of δ = 0.03. When we plot the isoconcentration surface for those voxels we find that these represent cavities. These voxels with very low Sc concentration sit on the edge of the Sc clusters, and thereby, enclose Sc clusters within themselves. A similar trend is observed with Mg where for low concentration we see Mg Isosurface containing cavities that enclose Sc clusters, whereas for high concentration there are few voxels.
7.5 Spatial Uncertainty in Isosurfaces
Atom probe tomography is a chemical imaging tool that produces data in the form of mathematical point clouds. Unlike most images which have a continuous gray scale of voxels, atom probe imaging has voxels associated with discrete points that are associated with individual atoms. The informatics challenge is to assess nano and sub-nanoscale variations in morphology associated with isosurfaces when clear physical models for image formation do not exist given the uncertainty and sparseness in noisy data. In this chapter, we have provided an overview of the application of topological data analysis and computational homology as powerful new informatics tools that address such data challenges in exploring atom probe images.
We gratefully acknowledge support from NSF DIBBs Project OAC-1640867 and NSF Project DMR-1623838. KR acknowledges support from the Erich Bloch Endowed Chair at the University at Buffalo-State University of New York.
- 11.T.F. Kelly, D.J. Larson, K. Thompson, J.D. Olson, R.L. Alvis, J.H. Bunton, B.P. Gorman, Ann. Rev. Mat. Res. 37, 681 (2007)Google Scholar
- 14.M.K. Miller, A. Cerezo, M.G. Heatherington, G.D.W. Smith, Atom-probe field-ion microscopy (Clarendon Press, Oxford, 1996)Google Scholar
- 15.J. Rsing, J.T. Sebastian, O.C. Hellman, D.N. Seidman, Microsc. Microanal. 6, 445 (2000)Google Scholar
- 16.D.N. Seidman, R. Herschitz, Acta Metall. 32, 1141 (1985)Google Scholar
- 17.D.N. Seidman, R. Herschitz, Acta Metall. 32, 1155 (1985)Google Scholar
- 35.L. Ericksson, T. Byrne, E. Johansson, J. Trygg, C. Vikstrom, Multi- and Megavariate Data Analysis: Principles, Applications (Umetrics Ab, Umea, 2001)Google Scholar
- 41.P.G. Cmara, Current Opinion in Systems Biology Future of Systems Biology Genomics and Epigenomics, vol. 1, p. 95 (2017)Google Scholar
- 47.N. Otter, M.A. Porter, U. Tillmann, P. Grindrod, H.A. Harrington, arXiv:1506.08903 (2015)
- 53.K. Xia, G.-W. Wei, International journal for numerical methods. Biomed. Eng. 30, 814 (2014)Google Scholar
- 61.O.C. Hellman, J.A. Vandenbroucke, J. Rusing, D. Isheim, D.N. Seidman, Microsc. Microanal. 6, 437 (2000)Google Scholar
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.