Methods for Rapid Pore Classiﬁcation in Metal Additive Manufacturing

The additive manufacturing of metals requires optimisation to ﬁnd the melting conditions that give the desired material properties. A key aspect of the optimisation is minimising the porosity that forms during the melting process. A cor-responding analysis of pores of different types (e.g. lack of fusion or keyholes) is therefore desirable. Knowing that pores form under different thermal conditions allows greater insight into the optimisation process. In this work, two pore classiﬁcation methods were trialled: unsupervised machine learning and deﬁned limits. These methods were applied to 3D pore data from X-ray computed tomography and 2D pore data from micrographs. Data were collected from multiple alloys (Ti-6Al-4V, Inconel 718, Ti-5553 and Haynes 282). Machine learning was found to be the most useful for 3D pore data and deﬁned limits for the 2D pore data; the latter worked by optimising the limits using energy densities.


INTRODUCTION
Additive manufacturing (AM) is the catch-all term for a number of different technologies that melt material in an additive way rather than removing material from a larger piece. The AM of metal forms a significant fraction of AM research, with most metal processes based around the powder bed fusion (PBF) method. PBF entails the melting of successive layers of metal powder. 1 There are many benefits to AM regarding less material wastage and the potential for lightweight designs and rapid prototyping. 1 As AM technologies mature, research has focused on developing the technologies from a research and small-scale production tool into an industrial production method. 2 One of the biggest technological challenges of metal-based AM is porosity. The defects are often the cause of failure in AM parts 3 and are one of the biggest challenges to overcome in getting AM parts into aerospace, 4 medical 5 and many other applications. 6 AM parts can receive a hot isostatic press treatment to reduce porosity 7 but the majority of pore reduction can be achieved by optimising the process parameters.
Compared with conventional metal production processes, such as casting or rolling, the process parameters in AM are all relatively recent developments. For PBF these process parameters concern the power, velocity and pattern taken by the melting beam, all of which control the melting of metal powder. For any new powder composition or morphology, new parameters must be developed. Optimising the process parameters is usually guided by the material properties. The density, roughness and phases are all properties that can be considered as well as practical concerns, such as powder disturbances and swelling.
Even with all these considerations, porosity is still one of the most critical properties for optimising the process parameters. 8 Mechanical properties of AM-produced parts can depend heavily on the porosity with fatigue, 9 strength 10 and Young's modulus 11 all decreasing with increased porosity. It is also worth noting that pores can vary significantly in size and shape and are formed because of different thermodynamic conditions. As large pores are more likely to cause structural failures, they are a higher priority for removal than small pores. 12 JOM https://doi.org/10.1007/s11837-019-03761-9 Ó 2019 The Author(s) Optimising parameters usually involves some form of Design of Experiment (DoE) or other experiment scheme. The process involves trialling different parameters and then measuring the results to find which parameters worked well (low porosity) and which did not (high porosity). The best parameters will be the ones that fully melt the metal powder without causing porosity via overheating or any other means. Quickly and accurately classifying pores would enable optimisation to be carried out on the most critical types of pores, not just on the total pore volume.

Types of Pores
Three main types of pores are thought to form during AM processes: gas, keyholes (KH) and lack of fusion (LF). Each has a distinct shape, size and formation mechanism. [13][14][15] LF pores are caused by a lack of input energy to the powder bed during the melting process. The lower input energy fails to fully melt the metal powder and so leaves voids in the resultant structure. LF pores tend to be large, similar in scale to the melt pool size and irregular 13 as shown in Fig. 1b. KH pores are caused by an excess of input energy during the melting process. Excess beam power causes excessive penetration of the metal powder, which after solidification leaves a pore near the bottom of the melt pool. The result is a relatively large pore that is usually circular horizontally and elongated in the vertical direction as shown in Fig. 1a. KH pores can also be wider at the top than the bottom, looking somewhat like a keyhole. 14,15 Gas pores are the smallest and most spherical of all pores. These pores are connected to trapped gas either already present in the metal powder or trapped during the melting process. These pores are usually the most prevalent pore type in AM. 13 Unsupervised Machine Learning Methods Unsupervised machine learning, or clustering, is an algorithmic method of grouping data together without prior knowledge about the underlying data structure. The methods take data, with any number of variables, and collects them together according to a set of numerical metrics. Most simple methods rely on distance-based metrics, for example the Euclidean distance between data points. More sophisticated methods would also include metrics of similarity of groups, overlap, information density measures, etc. The clustering in this research utilises exclusively the size, shape and relative dimensions of pores as the variables to be clustered, with the hypothesis that there is an underlying relationship that links pore morphology. In the literature, there are example of analyses related to AM that use the process parameters and pore locations but these are outside the scope of this work.
There are many clustering methods that each has different strengths and weaknesses in terms of performance, as well as differences in the level of complexity and sophistication in the way the clusters are estimated. Some hierarchical algorithms 'grow' clusters from the available data, while others estimate potential cluster centres, then assign data to each cluster. The differences in performance usually relate to the computational speed of the algorithms and how they cope with specific patterns of data. One example, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), has the advantage that the number of clusters does not need to be specified. However, this method can suffer if there are large differences in the data density. 17 This is pertinent to pore data where there may be relatively few large pores but classifying them is important.
Other examples include mixture models that can be modified into clustering algorithms. These models work best when the data fit a type of pattern (e.g. Gaussian). Given that the shape of the pore data is unknown and likely to vary significantly across data sets, a more general method was selected.
This research uses a K-means algorithm. This is one of the simplest clustering methods, based on using distance metrics to estimate potential cluster centres. While this algorithm may not be the most sophisticated computationally, it is very robust and with numerous examples in the literature with very good performance. The K-means algorithm starts by assigning randomly placed cluster centres, l, amongst the data set such that where K is the number of clusters. The data points are then labelled according to which cluster centre each data point is nearest to across n-dimensions (where n is the number of variables). The closest cluster centre is found by taking each point, x i , and measuring the distance to each cluster centre for all cluster centres. The l with the minimum value will label the point x i to that cluster. Once this has been completed the new cluster centres are then calculated using the average position of the newly labelled data. This whole process is iterated until every data label and cluster centre is static. 18 The random initialisation of cluster centres may produce a result that is not optimal. To prevent this the algorithm of random initialisations and clustering is repeated many times. A sensible number of repeats will balance computational time and ensure that at least one initialisation produces the best clustering result. Comparisons between cluster results are done using a cost function, J, that measures the distance between data points and cluster centres. This can be expressed as where m is the total number of data points and l c is the cluster centre that each x i is assigned to. The cluster results with the minimum value of J are selected as the optimal results. The number of clusters that are present may be known or unknown. If the number of groups is unknown the most appropriate number of clusters can be found. This is done by plotting the cost function, J, against number of clusters and looking for an 'elbow' or 'kink' where J drops suddenly. 18

Pore Measurement Techniques
The two techniques that were used for data collection were XCT for 3D pore data and optical microscopy for 2D pore data. For the XCT, samples were analysed using the 320/225 kV Nikon XTEK bay at the Henry Moseley X-ray Imaging facility within the University of Manchester. A cone-shaped beam of X-rays was generated using an electron beam with accelerating voltage of 160 kV and current of 150 lA directed onto a silver reflection target. A 1.5-mm copper filter was used to filter out lower energy photons and reduce the effect of beam hardening. The samples were positioned to give a geometric magnification of approximately 22.5 before 3142 radiographs were collected with a 1-s exposure time using a square 4-megapixel detector with 0.2-mm pixel pitch. Three-dimensional volumes with 8:9 lm voxel size were reconstructed from the radiographs using proprietary Nikon software employing a filtered back projection algorithm. The 32-bit 3D volume was imported into Avizo image processing software and converted to 8 bits to reduce computational requirements. Data sets were also segmented and quantified in Avizo using builtin automatic thresholding and quantification tools. Numerical data, including size, morphology and position, regarding individual pores were exported for further analysis.
For optical microscopy the samples were removed from the base plates using wire electrical discharge machining 1 mm from the base. The samples were ground approximately 1 mm and then polished to a 9-micron finish. Measurements were carried out using an Olympus BX51 Clemex microscope with Clemex Vision PE software, which allows roundness and horizontal dimensions to be measured. An example micrograph is shown in Fig. 2.
In terms of classifying different types of pores the 3D pore data acquired using XCT provide more information per pore than optical microscopy. However, in terms of practicality, optical microscopy offers a quicker and more cost-effective way of collecting data for multiple samples. This is particularly relevant for AM parameter optimisation, which often requires dozens of samples to find the optimal process parameters.

Production Methods
Samples were produced using laser beam powderbed SLM machines. The Ti-6Al-4V sample was built using a Renishaw 125. The sample was a cuboid, 10.61 mm in length and width and 15 mm in height. The process parameters used were a laser beam of 200 W, point distance of 70 lm and exposure time of 150 ls. The hatch spacing was 70 lm and the layer thickness was 50 lm, which are optimised for melting Ti-6Al-4V on this machine.
The Inconel 718 samples were produced using an Aconity Mini with a range of process parameters. The samples were cubes with a length 10 mm. Laser beam power varied from 60 W to 187 W, laser velocity was varied between 0.3 m/s and 2.3 m/s and hatch distance was varied between 25 lm and 140 lm. The thickness of the powder layer was 30 lm, which produced an energy density range of 11 J=mm 3 to 200 J=mm 3 .
The Haynes 282 (a nickel superalloy) and Ti-5553 samples were also cubes with a length of 10 mm but were produced on a Renishaw 125. Process parameters were varied between energy densities of 2J=mm 3 and 45 J=mm 3 for the Haynes 282 and 0:4J=mm 3 and 267 J=mm 3 for the Ti-5553. Full details of process parameters can be found in the supplementary data.
All samples were produced with gas atomised powder that has a size range of 15 lmto45lm. The Ti-6Al-4V, Ti-5553 and Inconel 718 were all purchased from LPW Carpenter Additive, while the Haynes 282 is from Praxair.
There are cases where the porosity and quality of the as-received powder have a direct influence on the porosity that is found in subsequent builds. 19 However, the small particle size limits the pores to 45 lm, so it is unlikely that these pores will be incorrectly classified as LF or KH pores.

Classification Method: Defined Limits
A defined limits approach sets limits on variables to identify types of pores. Knowing that pores tend to be small or large, irregular or round, etc., allows an unlabelled set of pore data to be classified. These limits for pores can be found in published values. 14,15,20,21 References to the length of a pore is the longest linear distance within a pore. Pore height is the length in the build direction and pore width refers to the longest distance in the horizontal plane.
LF pores are large and irregular and can be found using values for pore length and sphericity. The criteria selected were pores that were > 100 lmi n length 20 and < 0.6 in sphericity.
KH pores are relatively large and tend to be elongated in the vertical direction. These pores can be identified using a minimum size criterion (above 100 lm 21 ) and a comparison of the height against the width of the pore (ensuring the height is at least double the width). 14,15 Similar criteria limits were applied to 2D pore data using the roundness and pore length. However, limitations with the 2D pore data meant that the data were better analysed using defined limits with optimisation.

Classification Method: Defined Limits with Optimisation
This method uses defined limits to classify pores but finds the limits using optimisation. Instead of using existing data the limits are generated by analysing multiple data sets and finding the limits that best fit the data. This is particularly useful for 2D pore data where manually defining accurate limits is difficult but acquiring multiple data sets is relatively easy. For 2D pore data a defined limits method uses roundness and pore length, as shown in Fig. 3.
There is the opportunity to use the multiple data sets to find the pore length and roundness limits that best classify KH and LF pores. Given the energy density will dictate the formation of LF and KH pores there should be a relationship between energy density and pore frequency. Energy density, E, can be measured using where P is the power of the laser beam, v the velocity of the beam, h the hatch spacing and t the thickness of the layer. 22 This equation is suitable for Laser AM machines that have a continuous laser. For machines that have a pulse laser a slightly modified equation is required, where the point distance and exposure time replace the velocity of a continuous laser beam. 23 Increasing energy density should increase the number of KH pores and decrease the number of LF pores. An example of this type of energy density plot is shown in Fig. 4.
On Fig. 4 two best fit lines have been added for LF and KH pores. The best fit lines were plotted using either a polynomial, logarithmic power law or exponential equation, whichever fit the data best. The polynomial fit goes to third order to allow a close fit without overfitting the data. The quality of the agreement between the best fit line and the data can be measured by finding the coefficient of determination (R 2 ) for each line.
This process of plotting the data and finding the quality of the agreement can then be optimised using trial and error with different defined limits (the limits for the boxes in Fig. 3). The different limits will produce different levels of agreement between best fit lines and the data on the energy density plots (as shown in Fig. 4). The resultant R 2 values can then be compared to find the defined limits that best describe the LF and KH pores.

Classification Method: K-Means Clustering
An alternative approach is to cluster the pore data to find the LF and KH pores rather than to define them using set limits. This avoids the problem of not knowing where to set the limits for pore classification. The clustering can be done using any number of variables, although adding extra variables will slow the calculations and potentially increase the noise on the data.  To ensure each variable contributes equally to the clustering the values are all scaled to a similar range. The new values, x new , were calculated using the average value, x avg , and the range of values, such that where x i is the original value and the range is calculated using the minimum value, x min , and maximum value, x max , for each variable. This scaling ensures that all variables have a range of 1 and will therefore cluster the data with equal weight across all variables. A K-means algorithm was used to cluster the data. 18 Clusters were given a random initialisation and optimised until the clusters were stable. This was repeated 500 times with the best solution (the one with the closest fitting groups) used. The variables used for grouping the 3D pore data were pore length, sphericity and ratio of height to width.
Even though there are three main types of pores (LF, KH and gas as described earlier) this does not necessarily mean that there are three clusters within the data. It may be that some types of pore vary in properties enough that they are best described using multiple clusters. For example, instead of a single cluster to describe gas pores there may need to be a cluster for typical gas pores and another cluster for the smallest gas pores.
The total number of clusters for the data was found by analysing the data using different numbers of clusters (two to ten). The distance between the pores in each cluster, the cost function from Eq. 3, describes how well the clusters describe the data. More clusters will always reduce the cost function. If an additional cluster produces a sudden drop in the cost function this usually indicates the correct number of clusters. 18

Summary of Results
The main focus of this research is the development of the methods rather than investigating specific materials or process parameters. As such, materials were selected that best illustrate the strengths and weaknesses of the different methods.
The material selection was also based on the materials and data sets that were available. The details of the methods and materials are listed in Table I.

Results: Defined Limits Method-3D Pore Data
Criteria limits were applied to 3D pore data (2664 pores) from a Ti-6Al-4V cuboid. KH pores were identified as having a length > 0.1 mm 21 and an aspect ratio (in the build direction) > 2 14,15 as shown in Fig. 5a. LF pores were identified as being > 0.1 mm 20 with a sphericity < 0.6, which was judged to be the limit for being 'irregular' as shown in Fig. 5b.
These limits found there were 186 LF pores and just 1 KH pore. The most significant problem with the defined limits method is deciding what limits should be used. For criteria such as minimum sizes there are multiple sources with different values. For sphericity values there are no published values, and which value counts as 'irregular' is open to interpretation. As Fig. 5 shows, these limits have a noticeable effect on the results.

Results: K-Means Clustering-3D Pore Data
The number of clusters selected was five, as the sudden drop in J (from Eq. 3) occurred then. The cluster results are plotted in Fig. 6. Figure 6a shows the LF pores (the largest and least spherical) and the gas pores (the smallest and most spherical). The KH pores (elongated and larger) can be found in Fig. 6b. The other clusters (orange and green) are unclear but are likely to be slightly larger and slightly more irregular gas pores, although further investigation would be required to confirm these clusters. The cluster sizes show 183 LF pores and 373 KH pores.

Results: K-Means Clustering-2D Pore Data
Clustering on the roundness and pore length was applied on 2D pore data from 81 Inconel 718 samples. The average number of pores was 21955 with the smallest data set having 2981 pores. Clustering results did not consistently identify pores. LF pores were occasionally found but KH pores were rarely identified. Some examples of the clustering results are shown in Fig. 7. Using defined limits on 2D pore data does not work well. The difficulty in selecting appropriate limits for 2D pore data is even more problematic than for 3D pore data. As a result, this optimisation approach is used as described in the Methods section.
Analysis was carried out on 81 Inconel 718 samples. The results showed that the best limits for classifying LF pores was > 31 lm and < 0.32 roundness. The best limits for KH pores were > 61 lm and > 0.8 roundness. The best fit lines on the energy density plot (similar to Fig. 4)

Three-dimensional Pore Data Results
For the 3D pore data either K-means clustering or defined limits may be suitable. For LF pores both methods had similar results, respectively predicting 183 and 186 pores out of 2664 pores. For KH pores there was a significant difference in results with 373 predicted by the K-means clustering and 1 using defined limits. The values for the defined limits though could vary significantly depending on the limits selected.
The veracity of the two approaches can be confirmed by using more data sets. Data sets on samples that have been made with purposefully different energy densities should show different fractions of LF and KH pores. This can be used to refine the two approaches and should reveal which approach is superior.

Two-dimensional Pore Data Statistics
All the analysis on 2D pore data has assumed that the cross-sections of the pores are cutting across the middle of the pores at the widest part. This is not a true reflection though of the shape of pores, and there will be some pores that are cross-sectioned at a thinner part. The means that some number of the pores that appear small will actually be much larger pores. Consequently the predicted numbers of the larger pores, LF and KH, will be an underestimate.
From an optimisation standpoint this statistical bias is not important. All of the conclusions from this work, in terms of both pore classification and optimisation, use relative amounts of porosity. As pore data are only compared with other samples the bias on the overall amount should not affect the conclusions.
If the absolute numbers of pores were desired then they could be calculated by considering using the probabilities of where pores have been crosssectioned. This is a feasible task but would require 3D pore data to train the statistical corrections. These 3D pore data would require detailed mapping of pores to confirm what the 2D slices of each pore type look like along the full height of each pore.

Two-dimensional Pore Data Results
The 2D pore data could not be analysed using Kmeans clustering. It may be possible to classify the pores using a different machine-learning method, but there will always be the same data problems: too few variables and a continuous spectrum of values that are difficult to consistently classify.
Using defined limits with optimisation on 2D pore data was more successful. The limits were not predetermined, as was the case in the 3D pore data, but were found by fitting the pore classification results to the energy densities of the samples. This approach allowed defined limits to be found that could be used to classify LF and KH pores.
For the Inconel 718 samples, the LF and KH limits were generated with R 2 values of 0.44 for LF and 0.4 for KH. These are relatively modest R 2 values indicating only a moderate relationship at best. This is to be expected given the values that are being used in the R 2 calculations. Describing the porosity formation with just energy density values is unlikely to be perfect and so the relatively low R 2 values are understandable. A better model may be possible if energy density is replaced with a more complex model of energy input and material conductivity.
The other results showed similar trends and conclusions as the data from Inconel 718. The values for pore length and roundness for the Haynes 282 and Ti-5553 seem reasonable. There are large differences in where the defined limits form but this is expected. It reflects the fact that pore shapes and sizes vary between materials and classifying pores requires adaptable values rather than a set limit for all materials.

CONCLUSION
A K-means clustering and a defined limit method were trialled on both 2D (micrographs) and 3D pore data (XCT). The different methods had different degrees of success and not all would be suitable for future use.
1. Three-dimensional Pore Data Defined Limits was a reasonable method but there is difficulty selecting suitable limits. Some of the limits had no sources and even those with sources often had values that varied significantly. 2. Three-dimensional Pore Data Clustering was a good approach with it being possible to identify LF, KH and gas pores. A benefit of this approach though was the removal of user input in deciding any defined limits. 3. Two-dimensional Pore Data Clustering did not work well. Although available data established a fair starting point for 2D pore data clustering, more information needs to be stored to reliably classify pores. 4. Two-dimensional Pore Data Defined Limits did work but only after an optimisation process. The arbitrary nature of the defined limits was removed by calculating the limits using the data across many data sets. This approach was used successfully on three different alloys.