Optimal principal component analysis of STEM XEDS spectrum images
 256 Downloads
Abstract
STEM XEDS spectrum images can be drastically denoised by application of the principal component analysis (PCA). This paper looks inside the PCA workflow step by step on an example of a complex semiconductor structure consisting of a number of different phases. Typical problems distorting the principal components decomposition are highlighted and solutions for the successful PCA are described. Particular attention is paid to the optimal truncation of principal components in the course of reconstructing denoised data. A novel accurate and robust method, which overperforms the existing truncation methods is suggested for the first time and described in details.
Keywords
PCA Spectrum image Reconstruction Denoising STEM XEDS EDS EDXAbbreviations
 TEM
transmission electron microscopy
 STEM
scanning transmission electron microscopy
 XEDS
Xray energydispersive spectroscopy
 EELS
electron energyloss spectroscopy
 SDD
silicon drift detector
 PCA
principal component analysis
 SVD
singular value decomposition
 CMOS
complementary metaloxide semiconductor
 HAADF
highangle annular dark field
 ToFSIMS
timeofflight secondary ion mass spectroscopy
Background
Scanning transmission electron microscopy (STEM) delivers images of nanostructures at high spatial resolution matching that of broad beam transmission electron microscopy (TEM). Additionally, modern STEM instruments are typically equipped with electron energyloss spectrometers (EELS) and/or Xrays energydispersive spectroscopy (XEDS, sometimes abbreviated as EDS or EDX) detectors, which allows to turn images into spectrumimages, i.e. pixelated images, where each pixel represents an EEL or XED spectrum. In particular, the recent progress in STEM instrumentation and large collectionangle silicon drift detectors (SDD) [1, 2] made possible a fast acquisition of large STEM XEDS spectrumimages consisting of 10–1000 million data points. These huge datasets show typically some correlations in the data distribution, which might be retrieved by application of statistical analysis and then utilized for improving data quality.
The simplest and probably the most popular multivariate statistical technique is a principal component analysis (PCA) that expresses available data in terms of orthogonal linearly uncorrelated variables called principal components [3, 4, 5, 6, 7, 8, 9]. In general terms, PCA reduces the dimensionality of a large dataset by projecting it into an orthogonal basic of lower dimension. It can be shown that among all possible linear projections, PCA ensures the smallest Euclidean difference between the initial and projected datasets or, in other words, provides the minimal least squares errors when approximating data with a smaller number of variables [10]. Due to that, PCA has found a lot of applications in imaging science for data compression, denoising and pattern recognition (see for example [11, 12, 13, 14, 15, 16, 17, 18]) including applications to STEM XEDS spectrumimaging [19, 20, 21, 22, 23, 24].
A starting point for the PCA treatment is the conversion of a dataset into a matrix \(\mathbf {D}\), where spectra are placed on the matrix rows and each row represents an individual STEM probe position (pixel). Assume for definiteness that the \(m \times n\) matrix \(\mathbf {D}\) consists of m pixels and n energy channels. Although STEM pixels may be originally arranged in 1D (linescan), 2D (datacube) or in a configuration with higher dimensions, they can be always recasted into the 1D train as the neighborhood among pixels does not play any role in the PCA treatment. PCA is based on the assumption that there are certain correlations among spectra constituting the data matrix \(\mathbf {D}\). These correlations appear because the data variations are governed by a limited number of the latent factors, for example by the presence of chemical phases with the fixed composition. The spectral signatures of latent factors might be, however, not apparent as they are masked by noise. In this consideration, PCA relates closely to the factor analysis [7] although principal components generally do not coincide with the latent factors but represent rather their linear combinations [25].
The data matrix \(\mathbf {D}\) expressed by (1) can be subjected to dimensionality reduction or, in other words, truncation of components. Such dimensionality reduction might serve various purposes, for instance, it can be a first step for more complicated multivariate statistical treatment like unmixing data and extraction of latent factors. In the simplest case, dimensionality reduction can be utilized for removing the major part of noise from data, i.e. for its denoising.
The following questions are at the heart of the method. How much the dimensionality of a given dataset can be reduced? How many components must be retained to reproduce adequately the data variation and how many of them may be truncated to reduce noise? This paper attempts to address these crucial questions on the example of typical XEDS spectrumimages obtained in modern STEM instruments.
At a first glance, the reasonable number of retained components should be equal to the known (or expected) number of latent factors behind the data variations. It will be, however, shown that the situation is more complicated and the number of meaningful components might strongly differ from the number of latent factors—typically, there are less components than factors. The reason for this deviation is unavoidable corruption of data with noise.
To explore the topic most comprehensively, we considered an object with a very large number of latent factors and analyzed its experimental XEDS spectrum image. In parallel, we generated a twin synthetic object that mimicked the real one in all its essential features. An advantage of the synthetic data is the possibility to exclude noise in simulations and, therefore, compare the noisy data with the noisefree reference.
PCA is often considered as a fixed procedure, where little can be altered or tuned. In reality, there is a number of hidden issues hampering the treatment and leading to dubious results. Better understanding of the potential issues might help to design the optimal treatment flow improving the efficiency and avoiding artifacts. The systematic comparison between the experimental and synthetic data sets on the one hand and between the synthetic noisy set and the noisefree reference on the other hand, allowed us to identify the typical obstacles in the treatment flow and find the solutions for the optimal principal component decomposition and reconstruction of the denoised data.
Below, it will be demonstrated that certain pretreatments, namely weighting datasets and reducing its sparseness, are essential for the successful PCA of STEM XEDS data. Ignoring these pretreatments would deteriorate dramatically the denoising effect of PCA and might cause severe artefacts. This paper addresses also the problem of the optimal truncation of principal components in the course of reconstructing denoised data. A new accurate and robust method, which overperforms the existing truncation methods, is suggested and tested with a number of experimental and synthetic objects.
The paper is organized as follows: "Multicomponent object for spectrumimaging" section describes an object investigated with STEM XEDS and also its synthetic twin object designed to mimick the real one. "Principal component decomposition" section follows all steps of the principal component decomposition and highlights the potential problems distorting PCA in the case of XEDS spectrum images. "Truncation of principal components and reconstruction" section presents the theoretical background for truncation of principal components and discuss the existing practical truncation methods. A novel method for automatic determination of the optimal number of components is introduced in "Anisotropy method for truncation of principal components" section. At the end of "Truncation of principal components and reconstruction" section, the results of the spectrumimage reconstruction are shown and the denoising ability of PCA is demonstrated.
Results and discussion
Multicomponent object for spectrumimaging
Composition of layers (phases) constituting the investigated CMOS device
Phase notation  Composition (at.%) 

Si  100% Si 
SiOA  33% Si–67% O 
SiOB  29% Si–57% O–14% N 
HfO  33% Hf–67% O 
TiNA  50% Ti–50% N 
TiNB  50% Ti–40% N–10% O 
TiNC  45% Ti–45% N–10% Al 
TaN  50% Ta–50% N 
Al  80% Al–20% Ti 
AlO  40% Al–60% O 
SiN  43% Si–57% N 
In parallel to experiment, we generated a twin synthetic object with the layers of the same composition and roughly the same relative volume fractions (Fig. 2b). As demonstrated below, the synthetic object shows a good proximity to the experimental one, which helps to figure out important regularities in its PCA treatment. Then XEDS spectrumimages of the synthetic object were generated in two variants: with and without adding a Poisson noise. These two datasets will be referred to as noisy and noisefree synthetic datasets in the present paper. The generated noisy and noisefree spectrumimages are presented in the DigitalMicrograph format in Additional files 1 and 2 respectively. The details of the experiment and simulations are described in "Experimental details" and "Details of simulation" subsections.
Principal component decomposition
Unweighted PCA
It should be noted that in most cases, the first principal component differs drastically (in terms of the extraction accuracy) from the other ones because the first component consists of the mean data spectrum. To relax this difference we always subtract the mean spectrum from the data matrix \(\mathbf {D}\) prior to the principal component decomposition. This operation is usually referred to as data centering.
In contrast, the scree plot for the noisy dataset (Fig. 4b) indicates very poor correlations with the reference noisefree case. About 20 principal components can be roughly accounted as meaningful but there is no clear separation between them and the noise domain. The reconstruction with such a number of components leads to the unsatisfactory results dominated by noise and artifacts as will be shown in "Reconstruction of denoised datasets" section. We can conclude that PCA fails for the noisy synthetic dataset. It presumably fails also for the experimental dataset because its scree plot is quite similar to that for the noisy simulation.
The reason for the failure of classical PCA of noisy STEM EDX datasets is well known. PCA is able to extract meaningful variations only in the case when the superimposed noise has a similar level of variance in any given fragment of a dataset or, in another words, when the noise is homoscedastic. In fact, the dominant noise in XEDS spectra is Poisson noise, which is not homoscedastic.
Weighted PCA
It should be stressed, however, that the elements of \(\mathbf {W}\) provide only the estimates of the “true” signal level across the dataset. This estimation works typically quite well for STEM EELS but might be rather inaccurate in the case of STEM XEDS datasets as will be shown below.
After weighting the noisefree synthetic dataset, its scree plot (Fig. 4c) indicated 11 meaningful components, i.e. one more than that in the unweighted case. This can be explained by the nonlinearity in the data variations, which was shown to increase the number of the observed components against the number of the latent factors [25]. Such nonlinearity exists often in reallife objects and might be enhanced by the weighting rescaling.
Unfortunately, weighting does not improve the quality of principal component decomposition of the considered noisy synthetic dataset. Figure 3c demonstrates that the found eigenspectra still show poor correlation with the “true” ones. The 1st and 2nd true eigenspectra are partially retrieved in the 3rd–6th components of the noisy dataset but the rest meaningful components seem to be almost completely lost. In addition, the domains of meaningful and noise components in the scree plots (Fig. 4d) are not clearly separated for both noisy synthetic and the experimental datasets.
The failure of the weighting pretreatment in STEM EDX spectrumimaging has been already reported earlier [28, 29]. The reason for the problem is a high sparsity of typical STEM XEDS data that makes the evaluation of the elements of matrix \(\mathbf {W}\) inaccurate. The sparsity of both the experimental and the noisy synthetic datasets in the present work was about 0.001, which means that only 0.1% of the elements in data matrix \(\mathbf {D}\) were filled with a signal while 99.9% of them were empty. In this situation, the extracted mean spectrum and mean image suffer of random variations that makes the weighting pretreatment dangerous.
Appendix 2 considers the effect of sparsity on the weighing efficiency in details and Appendix 3 presents simulations confirming the conclusions of Appendix 2.
PCA with smoothing filter pretreatment
The most evident way to solve the sparsity issue outlined in the previous subsection is to smooth a dataset in either the spatial or energy directions prior the PCA treatment. The smoothing filtering would apparently reduce the sparsity of a dataset while hopefully preserving its general features. The simplest smoothing filter is binning the data as suggested by Kotula and van Bethlem [28]. Binning reduces also the size of a dataset that boosts the calculation speed and saves storage capacity. The disadvantage of intensive binning is a significant loss of the spatial and energy resolution, thus it might be employed only if the original dataset was oversampled for the required task (e.g. for obtaining an elemental map of given resolution or resolving certain features in spectra). Alternatively, data can be smoothed by Gaussian kernel filtering in the spatial or energy directions [29]. Gaussian smoothing fills the empty data elements even more efficiently than binning does while it deteriorates the resolution only slightly. On the other hand, Gaussian smoothing does not improve the calculation speed because the data size is unchanged. 2D Gaussian filtering in the spatial X and Ydimensions is most efficient in terms of reducing the data sparsity. Note that it must be performed before conversion of a data cube into matrix \(\mathbf {D}\) because the spatial information is then retained more consequently.
In the present work, the combination of binning and Gaussian filtering was employed to overcome the sparsity issue. For the comparison purpose, the same filtering was applied to experimental, noisy and noisefree synthetic datasets. The datasets were first subjected to the \(2 \times 2\) spatial binning, which provided a 4 times reduction in size. Then, the Gaussian kernel filtering with the standard deviation \(\sigma = 1\) pixel was applied. To save the calculation time, the Gaussian function was truncated at 10% of its maximum such that the kernel mask included 12 pixels around the central pixel (see [29] for details). No smoothing in the energy direction was applied.
The filtering pretreament dramatically improves the quality of principal component decomposition as demonstrated in Fig. 3d. The eigenspectra of at least 6 major components of the noisy synthetic dataset are now in a good proximity with the reference noisefree eigenspectra. The 7th component (blue in Fig. 3c) lies at the limit of detectability—although the proximity function is rather widespread, its maximum seems to stay at the approximately correct position.
The scree plot of the noisy synthetic dataset in Fig. 4f now clearly visualizes two domains  the domain of the meaningful components with a higher variance and the noise domain where the variance follows a steadily decreasing line. The border between two domains is located near the 6–8th component. Superposing the scree plots of the noisy and noisefree datasets reveals that they closely follow each other up to the 6th component. On the other hand, the scree plot of the experimental dataset is very similar to that of the noisy synthetic one, which suggests that most of components of the reallife object are retrieved accurately.
Truncation of principal components and reconstruction
At the next step of the PCA treatment, it is assumed that the only few major PCA components carry the useful information while the remaining minor components represent noise. Therefore, a dataset can be reconstructed using only k \((k\ll n)\) major components as illustrated in Fig. 1. This truncation implies a reduction of the effective dimensionality of data from n to k in the energy dimension. Accordingly, a dataset is significantly denoised because most of the noise variations are removed with the omitted minor components.

It can be higher than L as a result of experimental artifacts like changing the beam current in the course of scanning or changing response of the detector.

It can be higher than L due to nonlinearities in the spectra formation such as absorption in the mixture of light and heavy phases. These nonlinearities manifest themself as the appearance of additional dimensions in the energy space unrelated with any latent factor [25].

It can be smaller than L if the variance of some minor components approaches the variance of noise. Then, these components might be irretrievable from PCA [30, 31, 32].
Loss of minor principal components in noisy datasets
Extracted variances (\(\lambda\)) and “true” variances (\(\lambda ^*\)) of the noisy and true synthetic dataset
Component  \(\lambda\)  \(\lambda ^*\)  \(\frac {\lambda ^*}{\sigma ^2}\)  Retrievable 

1  1228  1214  43.2  ✔ 
2  938.3  906.4  32.3  ✔ 
3  509.2  482.2  17.2  ✔ 
4  444.7  422.8  15.1  ✔ 
5  273.8  214.6  7.65  ✔ 
6  94.04  40.05  1.43  ✔ 
7  83.53  6.571  0.234  ✔ 
8  81.98  0.5804  0.0207  – 
9  80.61  0.04119  1.47e−3  – 
10  78.30  5.13e−6  2.01e−7  – 
11  78.28  1.63e−6  6.43e−8  – 
Formula (6) also predicts that the range of detectable components can be extended by the smoothing filter pretreatment. Indeed, filtering reduces \(\sigma ^2\) thus smaller \(\lambda ^*\) can be retrieved^{2} [29]. This provides an additional argument in favor of filtering pretreatment described in "PCA with smoothing filter pretreatment" section.
An estimation within the spiked covariance model is instructive for understanding why minor components might be lost in the course of PCA. However, Eq. (6) is based on the precise knowledge of “true” eigenvalues \(\lambda ^*\) that are not directly accessible in the experiment. In the next subsections we consider existing practical truncation methods that do not require the knowledge of these parameters.
Scree plot method for truncation of principal components
Historically, one of the earliest and most popular methods is analyzing a scree plot. This is based on the assumption that meaningful components show a data variance noticeably higher than that of the noise. The variance of noise components is assumed to follow some smooth curve, thus the meaningful and noise domains can be visually separated on scree plots such as those in Fig. 4f.
In most cases, the scree plot analysis leads to satisfactory results. However, this kind of truncation requires manual evaluation and is not accurately reproducible as different persons tend to set the border between the visually distinct regions slightly differently. For the considered noisy synthetic and experimental datasets in Fig. 4f, the border can be subjectively set between 6 and 8. It is also quite difficult to incorporate the scree plot approach into automatic algorithms because the behavior of the noise variance might vary significantly and the factorization of its dependence in the noise domain is problematic.
Analytical modelbased methods for truncating principal components
Recently, several approaches for analytical determination of the optimal number of principal components have emerged [34, 35, 36, 37]. These methods are based on certain models for the mixture of noise and useful signal typically assuming the Gaussian nature of noise.
The approach of Gavish and Donoho as well as other similar approaches require the precise knowledge of the level of homoscedastic noise \(\sigma\) that is, in practice, very difficult to extract from experimental data (see [35] for details). This can be performed by subtracting all meaningful principal components and evaluating the retained data fraction with the socalled real error function [38]. The approach implies the timeconsuming iterative evaluation of the noise level alternated with the cutoff of meaningful components. Furthermore, as will be demonstrated in "Comparison of different truncation methods" section and Appendix 4, the accuracy of the resulted truncation is not guaranteed in the case of STEM EDXS spectrumimaging.
Anisotropy method for truncation of principal components
To overcome the limitations highlighted in the previous subsections, we suggest a novel practical method for truncation of principal components, which is flexible, objective and can be easily implemented in automatic algorithms.
It should be, however, stressed, that the anisotropy method fails for sparse STEM XEDS data. In this case, the anisotropy criterion shows quite high values both for the meaningful and noise components. The reason for that is apparent—if only a few data elements are assigned to one and the rest are zeros, a randomlike asymmetry of scatter plots might be observed even if the underlying data distribution is isotropic. Therefore, a treatment reducing the data sparseness like that described in "PCA with smoothing filter pretreatment" section is obligatory prior application of the anisotropy method.
Comparison of different truncation methods
The number of components to truncate according to the different truncation methods: the evaluation of a scree plot with visual localisation of the inflection point ("Scree plot method for truncation of principal components" section), the approach of Gavish and Donoho ("Analytical modelbased methods for truncating principal components" section) and the anisotropy method ("Anisotropy method for truncation of principal components" section) with using the projected histograms and the anisotropy threshold of 0.5
Dataset  Scree plot  Gavish and Donoho  Anisotropy 

Synthetic  6–8  30  7 
Experimental  6–8  7  7 
Although the scree plot and anisotropy methods perform similarly, the latter offers a crucial advantage—the cutoff can be determined less subjectively. Localizing the inflection point in a scree plot is straightforward but might require a number of tunable parameters in an unsupervised treatment. In contrast, the method of scatter plot anisotropy can be easily incorporated into an automatic algorithm. The anisotropy oscillates around zero in the noise domain, which is very beneficial compared to a scree plot, where the variance decays slowly from an unknown level. Therefore, a single threshold parameter can be used to discriminate the meaningful and noise domains. This parameter represents the relative deviation from isotropy that can be still tolerated. To our experience the threshold parameter can be set to 0.5–1.0 for the case of STEM XEDS and EELS spectrumimaging depending on the required accuracy of the detection of minor principal components. It is also possible to define the threshold adaptively depending on the measured variation of anisotropy in the region with very high indexes of components.
The suggested anisotropy method employs a very basic property of random noise—its directional isotropy. It does not put any strict assumptions on the specific nature of noise—Poissonian, Gaussian, or mixed. The synthetic data presented in this paper are corrupted by the Poisson noise, which is converted to the Gaussianlike one after the correctly performed weighting procedure. In real experimental data, some small fractions of noise might come from the imperfections of registration that makes the noise distribution more complicated. Some hints for that are the different slopes of the scree plots in the noise domains for experimental and syntetic datasets in Fig. 4f. Nevertheless, the anisotropy method delivers identical truncation cutoffs for both the datasets, which suggests a certain robustness against the nature of noise.
Appendix 4 shows more examples of application of the anisotropy method for truncating principal components in STEM XEDS data. The anisotropy criterion behaves similarly (compare Figs. 7e, 13b and 14b) in the variety of STEM XEDS data—it shows quite high values for the meaningful components and then oscillates around zero in the noise domain. Furthermore, it has been demonstrated that the method works reliably for STEM EELS spectrumimages as well [40].
Reconstruction of denoised datasets
Conclusions
We conclude that experimental STEM XEDS spectrumimages acquired with modern STEM instrumentation and typical acquisition settings can be noticeably denoised by application of PCA. Here, two pretreatments of the typically sparse STEM XEDS datasets are ultimately needed for the successful PCA: smoothing and weighting.
A crucial step for denoising spectrum images is the truncation of principal components. Theoretical consideration shows that the optimal number of retained components depends on the ratio between the levels of noise and expected meaningful variations in an object as well as on the number of pixels in a spectrum image.
We presented a promising method for optimally truncating principal components based on the analysis of the anisotropy of scatter plots resulting from the principal components decomposition. This method can be easily implemented in automatic algorithms, which promotes a smooth, unsupervised workflow.
Given the straightforward implementation of the presented PCA workflow and the power of the method for denoising datasets containing, e.g., only small concentrations of elements with sparse spectra, we anticipate a further increase in PCA applications to STEMbased spectrumimages as well as other hyperspectral techniques with similar dataset properties.
Methods
Experimental details
The STEM XEDS spectrumimaging was performed in the Titan G2 (S)TEM microscope operating at 300kV and equipped with the 4windows SDD XEDS detector. The TEM cross section of the CMOS device was prepared by FIB at 30 kV followed by Ga ion milling at 5 kV. The final thickness of the sample was approximately 50 nm.
The STEM scanning with collecting the XEDS signal was executed within 6 minutes in the multiframe mode across the \(244 \times 336\) pixel rectangle covering the area of approximately \(40 \times 50\) nanometers. The probe size was about 0.2 nm and the beam current was 120 pA. Although the spectra were originally acquired with 4096 energy channels, the data cube was then truncated to 1200 channels in the range of 0.2–12.2 keV that covered all useful XEDS peaks.
Details of simulation
A phantom object that mimicked a real CMOS transistor was generated as shown in Fig. 2b. The geometry of the layers was greatly simplified but their volume fractions were reproduced reasonably accurate. The composition of each layer was set according Table 1 and then the borders among them were numerically smeared out to mimic the roughness of the layers in the real device and the spread of the STEM probe along the 50 nm sample thickness.
XEDS spectra were generated using the simulation program DTSAII [41] developed in National Institute of Standards and Technology. The simulation employed an acceleration voltage of 300 kV, a realistic model for an SDD detector, a sample thickness of 50 nm and the compositions of the layers as listed in Table 1. The generated spectrumimages consisted of the same number of STEM pixels (\(244 \times 336\)) and energy channels (1200) as the experimental dataset.
The synthetic data were prepared in two variants: one with no noise (the counts were represented by floating numbers, not truncated to integers) and another with a Poissonian noise added according to the nominal signal at each data point (here the counts were represented by integers as appearing in the experimental set). For the best compliance with the experiment, the synthetic spectrumimages were scaled such that the total number of counts in the range of 0.5–12 kV coincided with that of the experimental dataset.
Footnotes
 1.
In the case when weighting is combined with centering a data set, the former should be executed first because data centering destroys the basic properties of Poisson distribution.
 2.
The more accurate analysis requires an introduction of the effective number of independent pixels \(m_{e}\), which can be also affected by filtering [29].
 3.
We rewrote original formula (3) from [36] to fit the definitions and notations used in the present paper.
Notes
Authors' contributions
PP collected the experimental data, developed the evaluation strategy and the code. AL analysed the various methods of truncating principal components and contributed to writing the manuscript. Both authors read and approved the final manuscript.
Acknowledgements
Nicholas W. M. Ritchie, Natinal Institute of Standards provided the useful comments on adaptation of the DTSAII package to TEM simulations.
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
The PCA treatment of the experimental and simulated objects was performed with the temDM MSA package downloadable at http://temdm.com/web/msa/. The synthetic datasets described in the present paper can be found at the same website under the name “synthetic STEM XEDS spectrumimages of a CMOS device”.
Funding
The authors acknowledge funding from Deutsche Forschungsgemeinschaft “Zukunftskonzept” (F003661553Ü6a1020605) and from European Research Council under the Horizon 2020 program (Grant 715620). The support by the Open Access Publishing Funds of the SLUB/TU Dresden is acknowledged.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary material
References
 1.Watanabe, M., Kanno, M., Okunishi, E.: Atomic resolution elemental mapping by EELS and XEDS in aberration corrected stem. Jeol News 45, 8–12 (2010)Google Scholar
 2.Schlossmacher, P., Klenov, D.O., Freitag, B.: Enhanced detection sensitivity with a new windowless XEDS system for AEM based on silicon drift detector technology. Microsc. Today 18(4), 14–120 (2010)CrossRefGoogle Scholar
 3.Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. 2, 559–572 (1901)CrossRefGoogle Scholar
 4.Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417–441 (1933)CrossRefGoogle Scholar
 5.Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. R. Stat. Soc. B 61, 611–622 (1997)CrossRefGoogle Scholar
 6.Jolliffe, I.T.: Principal component analysis, 2nd edn. Springer Verlag, Berlin (2002)Google Scholar
 7.Malinowski, E.R.: Factor analysis in chemistry, 3rd edn. Wiley, Hoboken (2002)Google Scholar
 8.Shlens, J.: A tutorial on principal component analysis. arXiv:1404.1100v1, (2014) https://arxiv.org/pdf/1404.1100.pdf
 9.Jolliffe, I.T., Cadima, J.: Principal component analysis: a review and recent developments. Philos. Trans. A 374, 20150202 (2016)CrossRefGoogle Scholar
 10.Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
 11.Baccheli, S., Papi, S.: Image denoising using principal component analysis in the wavelet domain. J. Comput. Appl. Math. 189(1–2), 606–621 (2006)CrossRefGoogle Scholar
 12.Zhang, L., Dong, W., Zhang, D., Shi, G.: Twostage image denoising by principal component analysis with local pixel grouping. Pattern Recognit. 43, 1531–1549 (2010)CrossRefGoogle Scholar
 13.Deledalle, C., Salmon, J., Dalalyan, A.: Image denoising withpatchbased PCA: local versus global. In: Proceedings of British machine vision conference, vol. 25, pp. 1–10, (2011). https://doi.org/10.5244/C.25.25
 14.Babu, Y.M.M., Subramanyam, M.V., Prasad, M.N.G.: PCA based image denoising. Signal Image Process. 3(2), 236–244 (2012)Google Scholar
 15.Kumarsahu, B., Swami, P.: Image denoising using principal component analysis in wavelet domain and total variation regularization in spatial domain. Int. J. Comput. Appl. 71(12), 40–47 (2013)Google Scholar
 16.Toyota, S., Fujiwara, I., Hirose, M.: Principal component analysis for the whole facial image with pigmentation separation and application to the prediction of facial images at various ages. J. Imaging Sci. Technol. 58(2), 02050111 (2014)CrossRefGoogle Scholar
 17.Jiang, T.X., Huang, T.Z., Zhao, X.L., Ma, T.H.: Patchbased principal component analysis for face recognition. Comput. Intell. Neurosci, 5317850, (2017) https://doi.org/10.1155/2017/5317850 Google Scholar
 18.Ng, S.C.: Principal component analysis to reduce dimension on digital image. Procedia Comput. Sci. 11, 113–119 (2017)CrossRefGoogle Scholar
 19.Titchmarsh, J.M., Dumbill, : Multivariate statistical analysis of FEG STEM EDX spectra. J. Microsc. 184, 195–207 (1996)CrossRefGoogle Scholar
 20.Titchmarsh, J.M.: EDX spectrum modelling and multivariate analysis of subnanometer segregation. Micron 30, 159–171 (1999)CrossRefGoogle Scholar
 21.Burke, M.G., Watanabe, Williams, D.B., Hyde, J.M.: Quantitative characterization of nanoprecipitates in irradiated lowalloy steel: advances in the application of FEGSTEM quantitative microanalysis to real materials. J. Mater. Sci. 41, 4512–4522 (2006)CrossRefGoogle Scholar
 22.Kotula, P.G., Keenan, M.R., Michael, J.R.: Tomographic spectral imaging with multivariate statistical analysis: comprehensive 3D microanalysis. Microsc. Microanal. 12, 36–48 (2006)CrossRefGoogle Scholar
 23.Yagushi, T., Konno, M., Kamino, T., Watanabe, M.: Observation of threedimensional elemental distributions of a Si device using a \(360^\circ\)tilt fib and the coldemission STEM system. Ultramicroscopy 108, 1603–1615 (2008)CrossRefGoogle Scholar
 24.Parish, C.M., Brewer, L.N.: Multivariate statistics applications in phase analysis of STEMEDS spectrum images. Ultramicroscopy 110, 134–143 (2010)CrossRefGoogle Scholar
 25.Potapov, P.: Why principal component analysis of STEM spectrum images results in abstract, uninterpretable loadings? Ultramicroscopy 160, 197–212 (2016)CrossRefGoogle Scholar
 26.Potapov, P., Engelmann, H.J.: TEM characterization of advanced devices in the semiconductor industry. 18th Conference Microscopy of Semiconducting Materials, Oxford, (2013)Google Scholar
 27.Keenan, M.R., Kotula, P.G.: Accounting for Poisson noise in the multivariate analysis of TOFSIMS spectrum images. Surf. Interface Anal. 36, 203–212 (2004)CrossRefGoogle Scholar
 28.Kotula, P.G., Van Benthem, M.H.: Revisiting noise scaling for multivariate statistical analysis. Microsc. Microanal. 21(3), 1423–1424 (2015)CrossRefGoogle Scholar
 29.Potapov, P., Longo, P., Okunishi, E.: Enhancement of noisy EDX HRSTEM spectrumimages bycombination of filtering and PCA. Micron 96, 29–37 (2016)CrossRefGoogle Scholar
 30.Lichtert, S., Verbeeck, J.: Statistical consequences of applying a PCA filter on EELS spectrum images. Ultramicroscopy 125, 35–42 (2013)CrossRefGoogle Scholar
 31.Potapov, P.: On the loss of information in PCA of spectrumimages. Ultramicroscopy 182, 191–194 (2017)CrossRefGoogle Scholar
 32.Jones, L., Varambhia, A., Beanland, R., Kepaptsoglou, D., Griffiths, I., Ishizuka, A., Azough, F., Freer, R., K, Ishizuka, Cherns, D., Ramasse, Q.M., LozanoPerez, S., Nellist, P.: Managing dose, damage and datarates in multiframe spectrumimaging. Microscopy 67(S1), 98–113 (2018)CrossRefGoogle Scholar
 33.Nadler, B.: Finite sample approximation results for principal component analysis: a matrix perturbation approach. Ann. Stat. 36(6), 2791–2817 (2008)CrossRefGoogle Scholar
 34.Perry, P.O.: Crossvalidation for Unsupervised Learning. Ph.D. thesis, Stanford University, (2009)Google Scholar
 35.Kritchman, S., Nadler, B.: Determining the number of components in a factor model from limited noisy data. Chemomet. Intell. Lab. Syst. 94, 19–32 (2008)CrossRefGoogle Scholar
 36.Gavish, M., Donoho, D.L.: The optimal hard threshold for singular values is 4/\(\sqrt{3}\). IEEE Trans. Inf. Theory 60, 8–12 (2004)Google Scholar
 37.Gavish, M., Donoho, D.L.: Optimal shrinkage of singular values. IEEE Trans. Inf. Theory 63(2137–2152), 8–12 (2017)Google Scholar
 38.Malinowski, E.R.: Theory of error in factor analysis. Anal. Chem. 49(4), 606–611 (1977)CrossRefGoogle Scholar
 39.Mardia, K.V.: Measures of multivariate skewness and kurtosis with applications. Biometrika 57(3), 519–530 (1970)CrossRefGoogle Scholar
 40.Potapov, P., Longo, P., Lubk, A.: A novel method for automatic determination of the number of meaningful components in the PCA analysis of spectrumimages. Microsc. Microanal. 24(Suppl 1), 572–573 (2018)CrossRefGoogle Scholar
 41.Newbury, D.E., Ritchie, N.W.M.: Performing elemental microanalysis with high accuracy and high precision by scanning electron microscopy/silicon drift detector energydispersive xray spectrometry. J. Mater. Sci. 50, 493–518 (2015)CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.