Do Spectra Live in the Matrix? A Brief Tutorial on Applications of Factor Analysis to Resolving Spectral Datasets of Mixtures

Kałka, Andrzej J.; Turek, Andrzej M.

doi:10.1007/s10895-021-02753-w

Do Spectra Live in the Matrix? A Brief Tutorial on Applications of Factor Analysis to Resolving Spectral Datasets of Mixtures

Original Article
Open access
Published: 06 August 2021

Volume 31, pages 1599–1616, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Fluorescence Aims and scope Submit manuscript

Do Spectra Live in the Matrix? A Brief Tutorial on Applications of Factor Analysis to Resolving Spectral Datasets of Mixtures

Download PDF

2112 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

In spite of a rapid growth of data processing software, that has allowed for a huge advancement in many fields of chemistry, some research issues still remain problematic. A standard example of a troublesome challenge is the analysis of multi-component mixtures. The classical approach to such a problem consists of separating each component from a sample and performing individual measurements. The advent of computers, however, gave rise to a relatively new domain of data processing – chemometry – focused on decomposing signal recorded for the sample rather than the sample itself. Regrettably, still a very few chemometric methods are practically used in everyday laboratory routines. The Authors believe that a brief ‘user-friendly’ guide-like article on several ‘flagship’ algorithms of chemometrics may, at least partly, stimulate an increased interest in the use of these techniques among researchers specializing in many fields of chemistry. In the paper, five different techniques of factor analysis are used for the analysis of a three-component system of fluorophores. These algorithms, applied on the excitation-emission spectra, recorded for the ‘unknown’ mixture, allowed to unambiguously determine its composition without the need for physical separation of the components. An example of using chemometric methods for physical chemistry research is also provided. For each presented technique of the data analysis, a short description of its theoretical background followed by an example of its practical performance is given. In addition, the Reader is supplemented with a basic information on matrix algebra, detailed experimental ‘recipes’, reference specialist literature and ready-to-use MATLAB codes.

Graphical abstract

Density-Based Clustering Based on Hierarchical Density Estimates

Inductively Coupled Plasma Optical Emission Spectrometry (ICP-OES): a Powerful Analytical Technique for Elemental Analysis

Article 02 November 2021

Evolutionary algorithms and their applications to engineering problems

Article Open access 16 March 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Motivation

Spectroscopic measurements were and still are widely used for determination of both composition and physicochemical properties of the examined samples [1]. However, interpretation of the obtained spectra, especially in the case of multi-component samples, is not always straightforward. The ‘traditional’ way of obtaining selective signal for each substance, and thus allowing for its unambiguous characterization, is to physically separate it from a mixture [2–3]. This method has, however, a natural limitation, as the separation of all mixture components is not always possible. Often it is also a time consuming procedure.

Hopefully, with the development of computer science, an alternative approach to investigating multicomponent samples has become available. This issue is now addressed by chemometrics. The chemometric techniques combine together chemical knowledge, mathematical and statistical apparatus and numerical optimization routines to effectively extract the desired information out of the data [4,5,6,–7]. Consequently, there is no need to physically separate components from the mixture. All the required information, concerning the individual signals, is obtained from the computations.

Though there are a plenty of articles in highly specialist literature describing the basics and the usage of the cheomometric techniques, yet still, the application of these methods is rather poorly reflected in the everyday analysis of the complex spectral datasets. Perhaps it is due to the fact that only few of them are explained in a comprehensive way, that is fully understandable for the non-expert audience and illustrated with the help of pictorial presentations [7,8,9,10,11,12,13,14,–15].

For this reason, the Authors of this paper attempt to shed an additional light on some of the ‘flagship’ chemometric methods used for resolving spectral mixtures, that are seldom discussed outside the specialist literature. This will include Target Factor Analysis (TFA) [16–17], Evolving Factor Analysis (EFA) [18,19,–20], Rank Annihilation Factor Analysis (RAFA) [21,22,–23] and Generalized Rank Annihilation Method (GRAM) [24–25]. Each presented algorithm will be provided with a brief description of its foundations as well as practical details and illustrative examples of its application followed by suggested literature references. Main advantages and some drawbacks will also be discussed.

Four types of supplementary materials have also been included. In Appendix A, the extension of the selected mathematical issues can be found. In Appendix B, the detailed descriptions of the experiments are included, so the measurements can be easily run over. Appendix C contains the MATLAB codes [26] for all the applied routines (which may be rewritten in any other freeware programming languages such as R [27] or Python [28]). Finally, in Appendix D, the Authors include a set of the originally measured spectral data.

Theoretical Background

A Brief Characteristics of UV-vis Spectroscopy

UV-Vis absorption spectroscopy is one of the most commonly used methods for determining the composition or physico-chemical properties of tested samples. As each substance has its ‘unique’ spectrum, UV-Vis measurements can be (and are) used for qualitative analysis purposes. Due to a linear relationship between signal and concentration, UV-Vis spectroscopy is often (and primarily) applied for quantitative analysis. This relationship is described by Lambert-Beer’s law

$$ A=\varepsilon \cdotp l\cdotp c $$

(1a)

where the proportionality factor between absorbance (A) and concentration (c) is optical path length (l) multiplied by molar absorption coefficient (ε).

Similar specification of the sample’s composition may also be provided by the UV-Vis emission spectroscopy techniques [29, 30]. Then, however, one basic condition must be fulfilled. At least one component of the analysed sample has to reveal fluorescence, phosphorescence or any other type of light emission phenomenon.

Although the Lambert-Beer’s law does not strictly apply to emission spectroscopy, for sufficiently (optically) diluted solutions (absorbance A < 0.1) an analogous linear relationship can be obtained. According to Parker’s law [31].

$$ {I}_{em}\approx 2.303\cdotp {\varphi}_{em}\cdotp {I}_{source}\cdotp A=2.303\cdotp {\varphi}_{em}\cdotp {I}_{source}\cdotp \varepsilon \cdotp l\cdotp c $$

(1b)

where the intensity of emitted light (I_em, signal) is directly proportional to the concentration of the analyte. Proportionality factors are then, except the already mentioned (1a), the quantum yield of the emission process (φ_em) and the intensity of the excitation light beam (I_source). Sensitivity of the measurements can be then easily modified by adjusting the parameters of the spectrofluorometer light source.

Although the UV-Vis emission measurements are mostly aimed at delivering the fluorescence or phosphorescence spectra, yet the absorption characteristics of the sample can also be obtained. The fluorescence excitation spectra are then recorded by changing the excitation wavelength and tracking the resulting signal response at one particular emission wavelength. In general, as the emitted light intensity is directly proportional to the absorbance (1b), the fluorescence excitation spectra bear a very strong similarity to the absorption spectra.

The combination of the excitation and emission spectra results in the excitation-emission data matrices or maps matrices or (EEM). By changing both the emission and excitation wavelengths during the measurement, it is possible to characterize, at the same time, both the absorption and fluorescent (phosphorescent) properties of the sample.

In some cases, the phenomenon of attenuation of the fluorescence emission intensity is also used. Tiny portions of a substance called the quencher are then added to the sample. The quencher molecules weaken the intensity of the light emitted by fluorophores in the processes including the intermolecular electron or energy transfer between the fluorophore and quencher molecules. Mathematically, this weakening of the fluorescence intensity is described by a linear Stern-Volmer eq. [29, 30].

$$ \frac{I_{em}^0}{I_{em}^Q}=1+{K}_{SV}\cdotp Q $$

(2)

According to the above formula, the emission intensity ratio of the unquenched sample (I⁰_em) to quenched one (I^Q_em) is directly proportional to the concentration of the added quencher (Q). The parameter of this proportionality, characteristic for a given pair of a fluorophore and its quencher, is called the Stern-Volmer quenching constant (K_SV).

Spectroscopic Data in Terms of Matrix Algebra

A recorded spectrum, either absorption or emission one, is a set of numerical values representing the intensity of the measured signal (x) depending on the wavelength (λ). Thus, from a mathematical point of view, the spectrum is a data vector x [32].

$$ \mathbf{x}=\left[{x}_1;{x}_2;{x}_3;\dots; {x}_{\lambda}\right] $$

The vector x can be set as a column of values in a data spreadsheet (Fig. 1). Two or more spectra (vectors) combined column-wise form a spectral data matrix X:

$$ \mathbf{X}=\left[{\mathbf{x}}_{\mathbf{1}},{\mathbf{x}}_{\mathbf{2}},{\mathbf{x}}_{\mathbf{3}},\dots, {\mathbf{x}}_{\mathbf{n}}\right] $$

The spreadsheet will therefore contain an array with dimensions λ x n, where n - number of combined spectra, λ - number of measurement points (set of wavelengths, Fig. 1).

As the recorded signal is directly proportional to concentration (1a, b), the spectrum x_A, measured for a particular sample of a substance A, can be expressed as the product of a certain ‘standard’ spectrum s_A related to the unit molar concentration of the solute A and a proper multiplier c_A representing its actual concentration [5].

$$ {\mathbf{x}}_{\mathbf{A}}={\mathbf{s}}_{\mathbf{A}}\cdotp {c}_A $$

For instance, if three substances, say A, B and C, are mixed together, the resulting spectrum x_ABC of their three-component mixture, will be, due to signal additivity, a linear combination of three vectors (spectra) representing the individual components.

$$ {\mathbf{x}}_{\mathbf{A}\mathbf{BC}}={\mathbf{s}}_{\mathbf{A}}\cdotp {c}_A+{\mathbf{s}}_{\mathbf{B}}\cdotp {c}_B+{\mathbf{s}}_{\mathbf{C}}\cdotp {c}_C $$

(3)

By analogy, a set of spectra x_1,ABC, x_2,ABC, x_3,ABC, …, x_n,ABC, measured for n different mixtures of A, B and C, can be defined as follows

$$ {\displaystyle \begin{array}{c}{\mathbf{x}}_{\mathbf{1},\mathbf{ABC}}={\mathbf{s}}_{\mathbf{A}}\cdotp {c}_{1,A}+{\mathbf{s}}_{\mathbf{B}}\cdotp {c}_{1,B}+{\mathbf{s}}_{\mathbf{C}}\cdotp {c}_{1,C}\\ {}{\mathbf{x}}_{\mathbf{2},\mathbf{ABC}}={\mathbf{s}}_{\mathbf{A}}\cdotp {c}_{2,A}+{\mathbf{s}}_{\mathbf{B}}\cdotp {c}_{2,B}+{\mathbf{s}}_{\mathbf{C}}\cdotp {c}_{2,C}\\ {}\dots \\ {}{\mathbf{x}}_{\mathbf{n},\mathbf{ABC}}={\mathbf{s}}_{\mathbf{A}}\cdotp {c}_{n,A}+{\mathbf{s}}_{\mathbf{B}}\cdotp {c}_{n,B}+{\mathbf{s}}_{\mathbf{C}}\cdotp {c}_{n,C}\end{array}} $$

The above set of equations can be rewritten briefly in matrix notation as

$$ \mathbf{X}=\mathbf{S}\cdotp {\mathbf{C}}^{\mathbf{T}} $$

(4)

By general consent, the matrix S, called a matrix of f (in this example it equals 3) spectral profiles, contains the standard spectra s_A, s_B and s_C of ‘pure’ substances A, B and C. The vectors c_A, c_B and c_C representing the actual concentrations of components A, B and C are columns of the matrix C sized n × 3, called a matrix of concentration profiles. Symbol T denotes the operation of matrix transposition. A graphical scheme illustrating the described matrix factorization is presented in Fig. 2.

Hence, having a set of ‘standard’ spectra of all components of a mixture, the concentrations of all substances (A, B and C) in each sample can be determined by performing a simple matrix operation:

$$ {\mathbf{C}}^{\mathbf{T}}={\mathbf{S}}^{+}\mathbf{X} $$

(5a)

The symbol S⁺ denotes a matrix pseudo-inverse

$$ {\mathbf{S}}^{+}={\left({\mathbf{S}}^{\mathbf{T}}\mathbf{S}\right)}^{-\mathbf{1}}{\mathbf{S}}^{\mathbf{T}} $$

with the property S⁺S = 1, obtained upon ‘inversion’ of a rectangular matrix [33] (as shown in SI - App. A.1).

Singular Value Decomposition of a Data Matrix and its ‘Consequences’

In everyday laboratory practice, it oftentimes happens that both matrices containing the spectral (S) and concentration (C) profiles of individual components remain unknown, so the X matrix decomposition given by formula (4) cannot be directly used. However, by applying a mathematical procedure known as Singular Value Decomposition (SVD), it is always possible to decompose the data matrix X into a product of three matrices, by convention usually denoted as U, Λ and V (Fig. 3) [5].

$$ \mathbf{X}=\mathbf{U}\boldsymbol{\Lambda } {\mathbf{V}}^{\mathbf{T}} $$

(6)

The SVD matrices U (λ x n) and V (n x n), consisting of two sets of eigenvectors, are characteristically structured with the property of column-wise orthonormality(U^TU = 1 and V^TV = 1, and in addition VV^T = 1) [5, 34]. Matrix Λ (n x n) is a diagonal matrix containing the singular values of the matrix X.

To understand the meaning and importance of the decomposition of the data matrix X into a product of three matrices, which actually do not contain the spectra or concentrations of pure components, a visual reference to geometry may be made (Fig. 4). The formula (3) defining a spectrum of the mixture x_ABC as the sum of the individual components spectra can be seen as analogous to the space representation of a certain vector p in the Cartesian coordinate system [32].

$$ {\displaystyle \begin{array}{c}\mathbf{p}=x\cdotp \mathbf{x}+y\cdotp \mathbf{y}+z\cdotp \mathbf{z}\\ {}p=\left(x,y,z\right)\end{array}} $$

The versors x, y and z are then identical to the vectors representing the ‘pure’ components spectra s_A, s_B and s_C. The multipliers (concentrations) c_A, c_B and c_C stand for the respective ‘coordinates’. The axes of such a coordinate system, in general, do not have to be mutually orthogonal [34].

Consequently, if the spectra of ‘pure’ components are unknown, the problem arises how to define the axes of such a coordinate system, that would allow to describe all the collected mixture spectra. And this is just when the SVD procedure comes to the aid. One can find a set of potentially useful axes (Fig. 4) in the matrix U. However, as this matrix contains up to n eigenvectors u (Fig. 3), the decision has to be made how many and which of them should be chosen.

Information on how many axes are actually needed to describe the measurement data matrix X and hence how many components are present in the mixture, can actually be gleaned from the diagonal matrix of singular values Λ. From the point of view of linear algebra, the recommended dataset consists of as many independent variables (geometrically – axes) as is the determined number of singular values which are distinctively greater than zero [5]. It is therefore possible to ‘truncate’ the U, Λ and V matrices into the ‘proper’ number of f columns (Fig. 3- grey areas). The ‘truncation’ is commonly marked with a bar above a ‘reduced’ quantity. The cut-off number f is called the number of significant factors, principal components or primary latent variables. A ‘recipe’ for drawing the desired coordinate system of X dataset is thus finally obtained. Although, in general, the set Ū of orthogonal axes defined in this way will not overlap with the ‘original’ axes, corresponding to the ‘pure’ component spectra s_A, s_B and s_C, the space spanned by the vectors u₁, u₂ and u₃ will remain identical (Fig. 5):

$$ {\mathbf{u}}_{\mathbf{1}}\cdotp {\lambda}_1\cdotp {v}_{n,1}+{\mathbf{u}}_{\mathbf{2}}\cdotp {\lambda}_2\cdotp {v}_{n,2}+{\mathbf{u}}_{\mathbf{3}}\cdotp {\lambda}_3\cdotp {v}_{n,3}={\mathbf{x}}_{\mathbf{n},\mathbf{ABC}}={\mathbf{s}}_{\mathbf{A}}\cdotp {c}_{n,A}+{\mathbf{s}}_{\mathbf{B}}\cdotp {c}_{n,B}+{\mathbf{s}}_{\mathbf{C}}\cdotp {c}_{n,C} $$

It is therefore quite easy to notice (see SI - App. A.3) that the vectors u₁, u₂ and u₃ are linear combinations of the pure component spectra s_A, s_B and s_C.

$$ {\overline {\mathbf{u}}}_{\mathbf{i}}={r}_{i,A}\cdotp {\mathbf{s}}_{\mathbf{A}}+{r}_{i,B}\cdotp {\mathbf{s}}_{\mathbf{B}}+{r}_{i,C}\cdotp {\mathbf{s}}_{\mathbf{C}} $$

(7a)

Needless to say this relationship is reflexive

$$ {\mathbf{s}}_{\mathbf{i}}={r}_{i,1}^{\prime}\cdotp {\mathbf{u}}_{\mathbf{1}}+{r}_{i,2}^{\prime}\cdotp {\mathbf{u}}_{\mathbf{2}}+{r}_{i,3}^{\prime}\cdotp {\mathbf{u}}_{\mathbf{3}} $$

(7b)

and can be rewritten in a concise matrix notation as

$$ {\overline {\mathbf{u}}}_{\mathbf{i}}=\mathbf{S}{\mathbf{r}}_{\mathbf{i}}\kern1.33em \mathrm{and}\kern1.33em {\mathbf{s}}_{\mathbf{i}}=\overline {\mathbf{U}}{\mathbf{r}}_{\mathbf{i}}^{\prime } $$

(7c)

Of course, the set of linear combination coefficients r and r’ remains unknown until the true spectra S are recovered. Nevertheless, the properties of the SVD matrices presented above are very useful in the analysis of the complex spectroscopic data.

Finally, the procedure of data reproduction is also worth mentioning. It consists of calculating the product of the U, Λ and V matrices, ‘truncated’ to f columns (Fig. 3)

$$ \mathbf{X}=\overline {\mathbf{X}}+\mathbf{E}=\overline {\mathbf{U}}\overline {\boldsymbol{\Lambda}}\overline {{\mathbf{V}}^{\mathbf{T}}}\kern0.45em +\mathbf{E} $$

(8)

As a result, the original dataset in the X matrix is ‘idealised’ to the f-variate system. Any imperfections, that do not fit into the adopted f-component model, are rejected. These ‘misfits’, collected in the matrix E, known as the error matrix, are often assumed to represent the undesirable measurement noise [5].

Experimental Model System

To present a practical use of the factor analysis apparatus for interpretation of spectroscopic data, a model experimental system was prepared (see SI - App. B). Methanol solutions of anthracene (A), 9-cyanoanthracene (CNA), 9,10-dicyanoanthracene (DCNA) and 9,10-diphenylanthracene (DPhA) were chosen for the study [35–36]. This choice was motivated by the fact, that anthracene and its derivatives show an easy to measure fluorescence phenomenon. In addition, the selected substances can mimic a post-reaction mixture, hypothetically obtained in the synthesis of monocyano derivative (CNA) from anthracene (A). Dicyano derivative (DCNA) is then a by-product and DPhA can be treated as an impurity that should not be present in the reaction system. Thus, a three-component mixture of A, CNA and DCNA was prepared with a proportion of 0.4 cm³, 0.5 cm³ and 0.3 cm³ of base solutions (see SI - App. B, Fig. 6). In order to maintain the linear dependence of the signal on concentration (1b), the proper dilution of all the solutions was kept. The controlled maximum absorbance was always lower than the limit value of 0.1 (i.e. in Fig. 6) [31].

For each fluorophore, as well as for the mixture, the set of absorption, excitation (EX) and emission (EM) spectra was measured (Fig. 7). For the CNA and DCNA samples the excitation-emission maps (EEM) were also recorded.

A Practical Example of Factor Analysis Performed on Excitation-Emission Maps

How Many Components Are in a Mixture?

By looking at a single absorption or emission spectrum of the ‘unknown’ mixture (Fig. 6), it is usually very difficult to determine how many components it consists of. However, the ‘pack’ of several spectra grouped in a form of an excitation-emission map (EEM), seems to be much more informative. Some ‘extra’ knowledge may be also revealed when a quencher is added to the sample (Fig. 8), as intensity of each fluorescent species is quenched at a slightly different rate (2).

In the studied case, even a ‘quick look’ at the recorded EEM reveals that the spectra could be divided into (at least) two distinct categories (Fig. 8). The first is characterised by a set of ‘spiky’ bands while the other is predominated by ‘smooth’ and ‘diffused’ bands. This distinction becomes even more apparent upon the addition of potassium iodide (KI) as a quencher (Fig. 8 – right panel, SI – Appendix B.4.3). Therefore, it can be immediately stated that the mixture consists of at least two components. However, in order to determine the correct number of significant factors responsible for the total variance of the analysed dataset, a more sophisticated and reliable method than ‘organoleptic’ assessment should be employed. Principal Component Analysis (PCA) is one of the most popular approaches suitable for that purpose [37]. As PCA was already widely discussed elsewhere (for relevant examples see [8, 13]), only the main features will be prompted below.

Since the excitation-emission map can be treated as a data matrix X_MIX, it can be factorized with SVD. A set of singular values Λ is then obtained (6). Just a reminder, the number of large non-zero singular values λ (or eigenvalues) should be equal to the number of significant factors responsible for the variance of the analysed dataset. In order to distinguish between significant and zero-like singular values [5, 13], some statistical criteria as those proposed by Malinowski [38] (S.5, S.6) can be additionally applied (see Table 1 and SI - App. A.4).

Table 1 Subsequent f singular values λ of the X_MIX data matrix (Fig. 8), consisting of 81 fluorescence spectra, with the corresponding parameters of relative σ² (S. 5) and summaric explained variance Σ (S. 6). The indicated number of significant factors (f = 3) is marked with an exclamation mark (!)

Full size table

Complementary, a graphical analysis of the eigenvectors can also be performed [39]. As significant eigenvectors and ‘pure’ component spectra are mutually related (7a-c, Fig. 5), the ‘shape’ of a significant eigenvector should somehow resemble the shape of the measured UV-Vis spectra (wide and diffused ‘bands’). On the other hand, all non-significant eigenvectors are expected to have an irregular, chaotic shape, representing the random incidental noise [39, 5].

By looking at the subsequent eigenvectors u of the matrix X_MIX (Fig. 9), it can be noticed that only first three of them have a ‘regular’ shape. The fourth eigenvector (and all that follow) remain ‘rugged’ and do not exhibit any characteristic features. It can be therefore concluded, that the full excitation-emission map is made up of combinations of only three independent spectra, which is fully consistent with the true composition of the analysed three-component sample (A + CNA + DCNA).

In general, on the basis of the applied criteria the number of fluorescent components in a mixture can be reliably determined (for the PCA routine –see SI, App. C.1). Yet, it is still unclear what these substances are or what their concentration is. The obtained results tend to prove that the computational analysis of the spectra may successfully replace ‘traditional’ methods, such as chromatography [2] or electrophoresis [3], which, at this point, could allow obtaining a similar outcome.

Which Substances May Be or Be Not Present in a Studied Mixture?

If the analysed sample is suspected to contain some known substances, the SVD of the excitation-emission data matrix may be used to confirm or reject this presumption. The Target Factor Analysis (TFA) approach is specifically dedicated for that purpose [16–17, 9]. The first step of TFA is to estimate a limited set of substances potentially present in a sample. Then, the adequate spectra of all these substances obtained either from personal measurements and/or a proper spectral database should be gathered. Next, a following reasoning may be carried out. If the mixture actually contains one of the ‘targeted’ substances, its spectrum should be related to the abstract spectra of the analysed data matrix by a linear transformation (7a-c, Fig. 5). It means that a proper combination of the significant eigenvectors u is expected to fully reproduce the ‘target’ test spectrum s_T (7b). At the same time, if the substance was not present in the analysed sample, then in general, neither combination of abstract spectra u will be able to fully restore its ‘original’ spectrum.

The mathematical formulation of the above conclusion can be performed in three consecutive steps. Firstly, for the ‘target’ spectrum s_T, the optimum coefficients r of a linear combination of the significant eigenvectors u are determined (7a).

$$ \mathbf{r}={\overline {\mathbf{U}}}^{+}{\mathbf{s}}_{\mathbf{T}} $$

Then, on the basis of the calculated r values, a ‘new’ spectrum ŝ_T is reconstructed from the eigenvectors (7b).

$$ {\hat{\mathbf{s}}}_{\mathbf{T}}=\overline {\mathbf{U}}\mathbf{r} $$

Finally, the reproduced spectrum ŝ_T is compared to the ‘initial’ one, s_T.

$$ \operatorname{}{\mathbf{s}}_{\mathbf{T}}={\hat{\mathbf{s}}}_{\mathbf{T}}\mid {\mathbf{s}}_{\mathbf{T}}\ne {\hat{\mathbf{s}}}_{\mathbf{T}} $$

Equivalently, it can be said that the ‘target’ test spectrum s_T is projected on the set of the significant eigenvectors, defining the dimensions of the predicted data-points space (Fig. 5, see SI - App. A.3). The projection product ŝ_T is then compared with the ‘original’ target spectrum s_T.

The comparison between the two vectors can be done graphically. Values of the subsequent elements of s_T are then put on the x-axis and the corresponding values of ŝ_T are placed on the y-axis are plotted against them (Fig. 10). If the ‘target’ test spectrum indeed had a contribution to the measured spectra of the analysed mixture, then both s_T and ŝ_T spectra will be almost identical. A one-to-one correlation (a straight line y = x) will then be observed. However, if the original and projected spectrum remain significantly different (the linear correlation is no longer preserved), it can be concluded, that the ‘targeted’ substance was not a component of the sample.

The above algorithm (see TFA routine – SI, App. C.3) was applied on the model excitation-emission map X_MIX and a set of the individual fluorescence spectra of A, CNA, DCNA and DPhA (Fig. 7) used as ‘targets’ (see SI – Appendix B.4.2). A linear correlation in the plots (Fig. 10) is observed for the first three of them, which suggests that the mixture consists of A, CNA and DCNA. On the other hand, the ‘original’ spectrum of DPhA and the spectrum ‘assembled’ from eigenvectors u remain significantly different. The absence of DPhA in the sample is thus graphically confirmed.

On the basis of the presented example, the target factor analysis can be seen as the powerful tool to validate the composition of an analysed sample, provided that some auxiliary adequate ‘targets’ are available. Consequently, TFA should be of great interest especially in synthetic chemistry, as it allows to assess a purity of the final products in view of the presence of possible contaminations.

How Much of a Component Is in a Sample?

Factor analysis allows also to determine the amount of a given substance in a sample, without a need of its physical separation. One of the algorithms dedicated for this purpose is the Rank Annihilation Factor Analysis (RAFA) [21,22,–23]. If the adequately ‘calibrated’ spectra, S, of all components of a mixture are known, then the simultaneous determination of all the component concentrations, C, may be performed by the already mentioned direct matrix calculation (5a)

$$ {\mathbf{C}}^{\mathbf{T}}={\mathbf{S}}^{+}{\mathbf{X}}_{\mathbf{MIX}} $$

But what if the researcher is interested in determining the concentration of only few selected components, i.e. the main products of the synthesis, or a given type of contamination? As an alternative to preparing a series of calibration solutions for all the mixture components (also for those, that are not under consideration), the following reasoning can be performed. Since in the UV-Vis measurements the signals are additive, the spectra of a mixture can be presented as the sum of the spectra of individual components. The excitation-emission map recorded for the mixture of A, CNA and DCNA, X_MIX, would be then a sum of three matrices

$$ {\mathbf{X}}_{\mathbf{MIX}}={\mathbf{X}}_{\mathbf{A}}+{\mathbf{X}}_{\mathbf{CNA}}+{\mathbf{X}}_{\mathbf{DCNA}} $$

combining the contributions of particular components.

Analogically, by measuring the excitation-emission map for a calibration sample of an individual component, i.e. CNA, a reference EEM matrix, Y_CNA, is obtained. Because the signal remains directly proportional to the concentration (1a, b), for any pair of the corresponding entries of the X_CNA and Y_CNA matrices, the following relation is fulfilled.

$$ \frac{x_{CNA}}{y_{CNA}}=\frac{c_x}{c_y}={\tau}_0 $$

(11)

The searched, unknown concentration of CNA in the analysed sample is denoted by c_x, while the well determined concentration of the standard by c_y. In the matrix notation, the above can be written as

$$ {\mathbf{X}}_{\mathbf{CNA}}=\frac{c_x}{c_y}\cdotp {\mathbf{Y}}_{\mathbf{CNA}}={\tau}_0\cdotp {\mathbf{Y}}_{\mathbf{CNA}} $$

The scaling parameter τ₀ is here the ratio of the CNA concentration in the analysed and reference (calibration) sample. Consequently, the X_MIX matrix can be presented as:

$$ {\mathbf{X}}_{\mathbf{MIX}}={\mathbf{X}}_{\mathbf{A}}+{\tau}_0\cdotp {\mathbf{Y}}_{\mathbf{CNA}}+{\mathbf{X}}_{\mathbf{DCNA}} $$

Of course, the value of τ₀ remains unknown as is c_x. However, it can easily be determined by the following scheme. Let the reference Y_CNA matrix, scaled by any τ parameter, be subtracted from X_MIX. A resulting difference matrix D_MIX will be then produced.

$$ {\mathbf{D}}_{\mathbf{MIX}}={\mathbf{X}}_{\mathbf{MIX}}-\tau \cdotp {\mathbf{Y}}_{\mathbf{CNA}}={\mathbf{X}}_{\mathbf{A}}+\left({\tau}_0-\tau \right)\cdotp {\mathbf{Y}}_{\mathbf{CNA}}+{\mathbf{X}}_{\mathbf{D}\mathbf{CNA}} $$

(12)

In general, the number of significant factors determined for the matrix D_MIX will be three (f = 3), as was in the case of the data matrix X_MIX. However, if the value of the arbitrarily adopted parameter τ is coincidentally equal to τ₀, then the difference matrix D⁰_MIX will consist only of two components:

$$ {\mathbf{D}}_{\mathbf{MIX}}^{\mathbf{0}}={\mathbf{X}}_{\mathbf{MIX}}-{\tau}_0\cdotp {\mathbf{Y}}_{\mathbf{CNA}}={\mathbf{X}}_{\mathbf{A}}+{\mathbf{X}}_{\mathbf{D}\mathbf{CNA}} $$

as the contribution of CNA will be annihilated. As a result, the number of significant non-zero singular values of D_MIX will be reduced by one (from three to two). The ‘last’ significant singular value λ_f (in the studied case – the third one) will be, then, a kind of an ‘indicator’, that can be used to find the ‘correct’ value of τ. As τ ‘approaches’ τ₀, the value of λ_f decreases and at ‘critical point’ (τ = τ₀), it will reach a value close to zero. Although a random search for the optimal τ value is always possible, a definitely more efficient approach is to launch a systematic search. A sequence of scaling parameters τ is then produced (i.e. τ = 0.00, 0.01, 0.02, ..., 1.00) and the evolution of the f-th singular value of D_MIX is traced. This is the so called iterative variant of rank annihilation factor analysis [21, 22]. An alternative, direct version [23] of this approach will be discussed in Chapter 4.5 (GRAM).

In the case of a model mixture of three fluorophores discussed here, an exemplary quantitative RAFA procedure (RAFA routine – see SI, App. C.4) will consist in determining the amount of CNA acting as the main reaction product. The excitation-emission maps for the calibration sample (0.5 cm³ / 10 cm³, see SI - App. B) have to be then recorded (Fig. 11). For comparison, the DCNA contribution, corresponding to the by-product, will also be quantified.

At this point, it is worth to briefly describe the method of ‘idealising’ the measured data by their reproduction based on SVD (see routine C.2 in SI, App. C). The analysis of the SVD matrices obtained for the Y_CNA matrix (Chapter 4.1) yields one significant singular value λ and one pair of vectors u and v^T (only one variable - CNA). By reproducing the excitation-emission matrix of CNA as (8)

$$ {\overline {\mathbf{Y}}}_{\mathbf{CNA}}={\mathbf{u}}_{\mathbf{1}}{\lambda}_1{\mathbf{v}}_{\mathbf{1}}^{\mathbf{T}} $$

a noticeable ‘improvement’ in the shape of EEM can be observed (Fig. 11). Compared to the ‘raw’ data, the random noise and residues from the Rayleigh scattering band, which are a characteristic obstacle for the analysis of the excitation-emission maps, are successfully removed.

With the use of the reference ‘idealised’ Y_CNA and Y_DCNA excitation-emission maps, the contribution to the recorded mixture signal of both CNA and DCNA (12) can be determined. The iterative RAFA algorithm (Fig. 12) shall be applied to find in the set of τ values the optimal scaling factor τ₀, related to the minimum of the third (f = 3) singular value of the difference matrices

$$ {\mathbf{D}}_{\mathbf{MIX}}={\mathbf{X}}_{\mathbf{MIX}}-\tau \cdotp {\mathbf{Y}}_{\mathbf{CNA}}\kern1.33em \mathrm{and}\kern1.33em \mathbf{D}{\prime}_{\mathbf{MIX}}={\mathbf{X}}_{\mathbf{MIX}}-{\tau}^{\prime}\cdotp {\mathbf{Y}}_{\mathbf{D}\mathbf{CNA}} $$

In the result, two optimal scaling parameters of 0.98 and 0.58 are obtained for CNA and DCNA, respectively. Therefore, in order to determine the concentrations of these compounds in the analysed sample one needs to multiply τ₀ values by the analyte concentrations c_y (11) in the calibration samples.

$$ {\displaystyle \begin{array}{c}{c}_x^{CNA}={\tau}_0\cdotp {c}_y^{CNA}=0.98\cdotp 0.50\ \left[{\mathrm{cm}}^3/10{\mathrm{cm}}^3\right]=0.49\ \left[{\mathrm{cm}}^3/10{\mathrm{cm}}^3\right]\\ {}{c}_x^{DCNA}={\tau}_0\cdotp {c}_y^{DCNA}=0.58\cdotp 0.50\ \left[{\mathrm{cm}}^3/10{\mathrm{cm}}^3\right]=0.29\ \left[{\mathrm{cm}}^3/10{\mathrm{cm}}^3\right]\end{array}} $$

Compared to the actual concentrations of CNA and DCNA in the mixture, equal to 0.50 and 0.30 [cm³/10 cm³] (see SI - App. B 2.3), respectively, the results are, to say the least, very satisfactory.

As it is demonstrated on the above example, the RAFA technique allows to independently determine the concentrations of the selected mixture constituents, without need of their physical separation. This is a great advantage in comparison to ‘traditional’ methods of quantitative analysis, as the separation of all mixture components is oftentimes difficult, time-consuming [2–3] and sometimes even impossible.

In Search of the Signal Selectivity

In the case of the sample analysis when the number of preliminary information is strongly limited, a rather intuitive approach is to reduce the complex system to a set of one-component subsystems, for which the recorded signal would be selective. A search for such selective subsystems among the whole dataset can be conducted using certain techniques offered by factor analysis [18,19,–20, 40].

As it was already proven, the number of significant singular values λ obtained for the data matrix X_MIX is strictly related to the total number of principal components attributed to the analysed system [5, 38]. The question which now should be addressed is whether or not there are any slices of the matrix, that are dominated by only one component. To find the answer, and ultimately to define the selective spectral regions of the EEM, the ‘whole’ matrix can be ‘sliced’ into smaller segments, for which a systematic analysis of the number of significant factors should be performed. As there are many hints suggesting how to systematically divide the ‘full’ data matrix into submatrices (i.e. [18, 40]), the Evolving Factor Analysis (EFA) [18,19,–20] approach will be discussed here as an example.

Since the excitation-emission map can be viewed as a set of n fluorescence (or excitation) spectra, the initial submatrix M₁ can be defined as its segment, consisting of ‘first’ f consecutive spectra, where f is the number of significant factors determined for the ‘whole’ original dataset X_MIX. For this submatrix, the SVD procedure is performed, and f singular values λ are determined. On their basis it is possible to estimate how many significant factors are responsible for the variance of the currently analysed EEM segment. The second ‘slice’ M₂ of the matrix X_MIX is then constructed by augmenting the submatrix M₁ by one more consecutive spectrum (f + 1). Again, the SVD procedure is carried out. The cycle of augmenting the submatrix M_i (Fig. 12) and calculating its singular values λ is looped until the size of this expanding submatrix reaches the size of the original data matrix X_MIX. The algorithm for systematic construction of submatrices may also be initiated from the ‘opposite side’ of the analysed data matrix. The matrix M₁ would then consist of the ‘last’ f spectra (n, n - 1, …, n - f + 1) and it will be expanded to include the spectra localized on its ‘left’ side. To distinguish between this two equivalent ‘directions’ of the sumbatrix augmentation, the names ‘forward’ and ‘backward’ are used (Fig. 12) [19]. Moreover, the data matrix can be ‘sliced’ vertically as well as horizontally (Fig. 13). In the case of EEM it means that one of these modes would allow for the analysis of the spectral selectivity in the excitation while the other in the emission spectra.

Finally, by comparing significance of the singular values λ, obtained in each iteration, for instance graphically (see Fig. 14), the analysis of how the number of significant factors evolves with the size of the expanding submatrix (and thus with the wavelength range) can be done.

The example of the EFA procedure (see EFA routine - SI, App. C.5) will be illustrated here on the data matrix X_MIX (Fig. 8). The ‘scanning’ procedure was performed by augmenting an initial set of three (f = 3) emission (columns) and excitation (rows) spectra in both ‘forward’ and ‘backward’ directions (from ‘red-to-violet’ and from ‘violet-to red’, Fig. 13). The outcomes presenting the evolution of λ values in both excitation and emission wavelengths are displayed in Fig. 14. The interpretation of the presented plots is as follows. ‘Going forward’ from longer to shorter excitation wavelengths (‘red-to-violet’) it can be observed that up to 425 nm, only one λ is noticeably different from zero. Thus, the signal is selective in this range (DCNA). Then, the second singular value becomes significant (two components up to 385 nm), and finally, at the 385 nm wavelength – also the third. In the ‘backward’ direction, practically from the very beginning (300–305 nm) all three λ evolve simultaneously, which means that there is no selective range at the ‘violet edge’ of the mixture excitation spectrum.

Interpretation of the EFA plot for the emission spectra is just analoguous. However, in contrast to the excitation spectra, the backward EFA indicates that at the ‘very end’, the signal comes from only one component, which does not fully correspond to the reality (Fig. 7). A two-component signal, related to CNA and DCNA should be observed in the range of 505–550 nm (Fig. 14 – top panel). Unfortunately, the spectra of these two substances in this range remain practically identical and therefore, mathematically, the dataset is associated with only one component. This is a perfect example of one of the main problems encountered in factor analysis. Combining mathematical and chemical methods do not always has to be consistent.

Nevertheless, by combining the obtained results (colored surfaces in Fig. 14), the discussed EFA algorithm allows to determine in which regions of the excitation-emission map the recorded signal remains selective and how complex the other segments of EEM are (Fig. 15). As a result, the single-component spectral ranges may be picked out, which substantially facilitates the analysis of the studied system. In such a case the spectra of ‘pure’ components can directly be gathered into one block.

Sample as a ‘Black Box’

When faced with ‘fully’ unknown samples, any technique allowing at least to estimate the individual excitation or emission spectra of its components is extremely useful. One such means is the Generalised Rank Anihilation Method (GRAM) [24, 25]. Because GRAM is an ‘extended’ version of the RAFA approach (Chapter 4.3) [23], the algorithm is in an analogous manner focused on finding such a transformation of the pair of the data matrices X_MIX and Y_MIX, that would result in annihilation of the signal coming from one of the fluorescent species (12).

Since GRAM, unlike ‘classical’ RAFA, enables determination of more than one component at the same time, the successful usage of this method calls for meeting another condition. Relative contributions of all components to the recorded signal have to be mutually different between the two compared samples. This condition is obeyed when the ratios of all the concentrations are different for X_MIX and Y_MIX. However the required variability may be also fulfilled by addition of a small portion of a quencher to the examined mixture (see SI – Appendix B.4.3). According to the Stern-Volmer eq. (2), the intensity of the emitted light will decrease for each fluorophor in a slightly different way (Fig. 8). Thus, the individual contributions of all components to the total spectrum would differ before and after the addition of the quencher.

For the considered example of a three-component model mixture, in terms of the individual signal annihilation, a set of three optimal scaling factors τ₀ should be obtained (8).

$$ {\mathbf{D}}_{\mathbf{MIX}}^{\mathbf{0}}={\mathbf{X}}_{\mathbf{MIX}}-{\tau}_0\cdotp {\mathbf{Y}}_{\mathbf{CNA}}={\mathbf{X}}_{\mathbf{A}}+{\mathbf{X}}_{\mathbf{D}\mathbf{CNA}} $$

These can be estimated by the iterative algorithm (Figs. 16, 12), already discussed in Chapter 4.3 (RAFA), applied to a pair of the excitation-emission maps recorded before (Y_MIX) and after (X_MIX) the addition of KI (Fig. 8).

Although it can be clearly seen that the contributions of all three components (A, CNA, DCNA) to the variance of the resulting difference spectral matrix are successively ‘eliminated’, it is not possible to assign which τ₀ value refers to which analyte. The obtained information seems to be rather ‘useless’ for the purpose of the quantitative analysis of the sample.

However, with the use of a ‘smart’ mathematical transformation of matrices X_MIX and Y_MIX it is possible to obtain the optimal scaling parameters τ₀ and relate them to the excitation S_EX and emission S_EM spectra of all the components. This approach, known as non-iterative version of GRAM [23,24,–25], consists of three main steps (SI - App. A.5). First, one of the data matrices (preferably the ‘reference’ one) is decomposed with the SVD algorithm (here it is Y_MIX).

$$ {\mathbf{Y}}_{\mathbf{MIX}}=\mathbf{U}\boldsymbol{\Lambda } {\mathbf{V}}^{\mathbf{T}} $$

Next, from the second data matrix (here X_MIX) and the truncated (f = 3) SVD matrices(Fig. 3), a helping square matrix H is formed [24]

$$ \mathbf{H}={\overline {\mathbf{U}}}^{\mathbf{T}}{\mathbf{X}}_{\mathbf{MIX}}\overline {\mathbf{V}}{\overline {\boldsymbol{\Lambda}}}^{-\mathbf{1}} $$

for which the eigenvector-eigenvalue problem is finally solved.

$$ \mathbf{Hr}={\tau}_0\mathbf{r} $$

The sequentially calculated eigenvalues are identical to the optimal scaling factors τ₀ (11) (Fig. 16), while the set R of the associated eigenvectors r may be used to obtain the excitation and emission spectra of ‘pure’ components (7c, see SI - App. A.5, [24]).

$$ {\mathbf{S}}_{\mathbf{EM}}=\overline {\mathbf{U}}\mathbf{R} $$

$$ {\mathbf{S}}_{\mathbf{EX}}^{\mathbf{T}}={\mathbf{R}}^{-\mathbf{1}}\overline {\boldsymbol{\Lambda}}{\overline {\mathbf{V}}}^{\mathbf{T}} $$

The assignment of all τ₀ values to all mixture components is then possible.

The fluorescence emission and excitation spectra, ‘extracted’ from the model excitation-emission maps by the direct GRAM approach (see GRAM routine - SI, App. C.6), are presented in Fig. 17. As can be noticed, the calculated spectra exhibit a very high similarity to the spectra recorded for individual components.

This indicates that GRAM may be successfully used for both qualitative and quantitative analysis of complex mixtures. It is worth to note that for the former purpose no special conditions have to be fulfilled. In the latter case, however, a proper calibration sample has to be prepared (just like in the ‘classical’ RAFA), because the presented quencher addition technique is not suitable for determination of the absolute concentrations.

Eventually, it can be mentioned, that if needed, the spectra estimated by GRAM may be refined with some dedicated algorithms, allowing for example to remove (residual) negativities (i.e. ALS [41] – basics of the approach - see SI - App. A.6, routine – App. C.7).

Factor Analysis in Physico-Chemical Studies

Since the methods of factor analysis are widely used in physicochemical studies of multi-component systems (i.e. in kinetics and thermodynamics) [42,43,44,–45], at the very end of this article, an example of such application will be briefly discussed.

As far as the model system of three fluorophores (A, CNA, and DCNA) is concerned, the physicochemical characteristics may involve, for instance, an estimation of the Stern-Volmer quenching constants K_SV for each substance (2). For that purpose, the decay of the individual emission intensity, caused by the addition of a quencher, should be evaluated. Determining the ratio of the fluorescence intensities measured before (I⁰_em) and after (I^Q_em) the addition of a certain amount Q of the quencher (2), brings into the scene the already discussed GRAM or RAFA approach (Chapters 4.3 and 4.5). To reduce the time consumption of the research, the spectral measurements can be made at only a few (here at least three, f = 3) excitation (or emission) wavelengths producing the spectra with a contribution from all three components (MIX 3 range in Figs. 14 and 15). The excitation lines of 345, 355 and 365 nm may serve as an example (see SI – Appendix B.4.3). The fluorescence spectra are then measured for the unquenched sample and each time when a successive portion of the quencher Q is added to the mixture. As the result, the ‘reference’ matrix Y₀ (Q = 0) as well as a set of consecutive X_Q1, X_Q2, X_Q3 etc. data matrices are obtained. Using either iterative or direct version of GRAM, a set of optimal scaling parameters τ₀ is determined for all the pairs of matrices Y₀ and X_Q (X_Q = X_Q1, X_Q2, …).

$$ {\mathbf{D}}_{\mathbf{Q}}={\mathbf{X}}_{\mathbf{Q}}-\tau \cdotp {\mathbf{Y}}_{\mathbf{0}} $$

Due to the fact that Y₀ is treated as the ‘reference’ matrix, the obtained parameters τ₀ describe the ratio of the quenched (I^Q_em) to unquenched (I⁰_em) fluorescence intensity for all the components at a certain level Q of the quencher concentration. Thus, the reciprocal values of τ₀ are identical to the intensity ratios as defined by the Stern-Volmer eq. (2).

$$ \frac{I_{em}^0}{I_{em}^Q}={\tau_0}^{-1}=1+{K}_{SV}\cdotp Q $$

Consequently, in order to determine the values of the Stern-Volmer quenching constants K_SV, the reciprocals of τ₀ are plotted against the quencher concentration Q (Fig. 18), and then a linear regression (2) is performed with a unit intercept. The slope of a straight line of best fit drawn through the data points determines the value of K_SV (Table 2). The full routine can be found in SI, as Appendix C.8.

Table 2 Stern-Volmer quenching constants K_SV (2) determined with use of GRAM, selective signal analysis, and ‘sequential RAFA ‘cascase’ techniques (Fig. 19). For comparison, the Stern-Volmer constants determined independently for each of the ‘pure’ components are also presented

Full size table

An alternative, though less ‘direct’ approach, is to obtain three sets of the fluorescence quenching spectra of single fluorophores. With the use of EFA (Figs. 14 and 15) it can be noticed that both anthracene and 9,10-dicyanoanthracene exhibit selective emission in certain wavelength regions of the EEM. Therefore, by performing measurements under such spectral conditions, one can directly obtain a ‘pure’ signal of the quenched fluorescence for both A and DCNA (Fig. 18, Table 2). In this way, however, a selective signal for cyanoanthracene cannot be extracted. Thus, a more sophisticated method should be applied.

The EFA performed on the excitation-emission map reveals that the signal coming from CNA can be observed in some two-component regions (MIX 2, Fig. 15). As the spectra of fluorescence quenching of both A and DCNA are known, the RAFA (or GRAM) technique can now be used to eliminate the signal contribution from the counterpart fluorophore by its annihilation. Analogically, the same procedure can be applied to decompose spectra, where the signal comes from three components. An exemplary algorithm, allowing to obtain the series of the quenched fluorescence spectra of A, CNA and DCNA is presented below.

1.
The spectra are recorded in the region selective for DCNA, using excitation line 425 nm.
2.
Simultaneously, the two-component fluorescence spectra (CNA + DCNA) are measured with the excitation wavelength of 400 nm and the three-component spectra (A + CNA + DCNA) as excited with the 355 nm line.
3.
RAFA algorithm is applied on the first two datasets (minimum of the second singular value is searched). The resulting two-component spectra are then ‘purified’ from the signal contribution of DCNA. Consequently, the CNA spectra of quenched fluorescence are obtained (Fig. 19– step 1).
4.
The annihilation of the DCNA signal contribution is then performed in an analogous manner for the three-component dataset (the evolution of the third singular value is traced). Then RAFA is repeated for the resulting two-component mixture (A + CNA, second singular value). As the CNA contribution disappears, the obtained spectra represent the ‘pure’ signal of A (Fig. 19 – step 2).

4′. Alternatively, one can simultaneously determine the spectral contribution of CNA and DCNA to the third spectral dataset (for both constituents the third singular value is traced independently). Then, both matrices, containing the ‘pure’ spectra of these compounds are subtracted from the data matrix containing the spectra of the three-component system. The final result should be identical as that in the previous step-wise approach.

When the above algorithm is completed (see routine App. C.9 in SI), the three series of quenched fluorescence spectra of individual components are recovered from the multi-component dataset (Fig. 19). The Stern-Volmer plots are then obtained in the ‘classical’ way, that is by direct calculation of the proper ratios of the unquenched to quenched emission intensities (Fig. 18). The resulting Stern-Volmer constants K_SV can be found in Table 2.

Comparing the values of the Stern-Volmer constants obtained by GRAM, ‘cascade’ RAFA and selective region analysis (Table 2), it can be concluded that all these approaches remain consistent, as they provide very similar results.

What is worth to be noticed is the fact that the Stern-Volmer quenching constants estimated for all three components in a mixture are slightly different from those determined independently for single-component solutions (see SI – Appendix B.4.2). The higher values obtained in the case of the former ones may likely suggest, that some subtle, additional interactions between the molecules in the mixture occur. This effect usually evades observation when the system is separated into components in order to perform the analysis in a ‘traditional’ way.

The above example clearly shows that some phenomena unveiled by the methods of factor analysis remain ‘unavailable’ for classical analytical techniques.

A Brief Summary

The main purpose of the examples discussed in this article was to highlight the opportunities and benefits of applying the chemometric methods in the everyday laboratory routine.

On a few practical examples it was shown that factor analysis techniques can be successfully used in order to a) estimate the number of components in the examined sample (PCA), b) search for the selective signal in the spectra of a mixture (EFA), c) validate whether the particular substance is present (or not) in the sample (TFA) and d) perform qualitative and quantitative analysis of the sample (RAFA & GRAM). It is worth to mention that all the results were obtained only by the computer analysis of the datasets, measured for the mixtures. No physical separation of the components was required at any step of the undertaken analysis, which gives an alternative to ‘traditional’ approaches such as chromatography and electrophoresis.

Although the potential offered by the recalled techniques is believed to be already noticed, it should be admitted that it is just a ‘tip of the iceberg’. Nowadays, the number of all available algorithms and their variants is practically countless. Moreover, the techniques may be combined together in both highly specific as well as general way, which only multiplies the total number of tools suitable for the analysis of spectral datasets offered by chemometrics.

Unfortunately, this ‘mathematized’ treatment of quantitative aspects of the spectroscopic data seems to be not so popular and sometimes even unknown within numerous communities of chemists and spectroscopists. Therefore, by publishing this article, the Authors hope to bring the factor analysis algorithms closer to creative individual researchers working in various domains of chemistry.

Data Availability

All data generated or analysed during this study are included in supplementary information files (Appendix D) for this article.

References

Zinatloo-Ajabshir S, Heidari-Asil SA, Salavati-Niasari M (2021) Simple and eco-friendly synthesis of recoverable zinc cobalt oxide-based ceramic nanostructure as high-performance photocatalyst for enhanced photocatalytic removal of organic contamination under solar light. Sep Purif Technol 267:118667
Article CAS Google Scholar
Rubio-Clemente A, Chica E, Peñuela GA (2017) Rapid determination of anthracene and benzo (a) pyrene by high-performance liquid chromatography with fluorescence detection. Anal Lett 50:1229–1247
Article CAS Google Scholar
Nie S, Dadoo R, Zare RN (1993) Ultrasensitive fluorescence detection of polycyclic aromatic hydrocarbons in capillary electrophoresis. Anal Chem 65:3571–3575
Article CAS Google Scholar
Malinowski ER, Howery DG (1980) Factor analysis in chemistry. Wiley, New York
Google Scholar
Maeder M, Neuhold YM (2007) Practical data analysis in chemistry. Elsevier, Amsterdam
Google Scholar
Brown S et al (2009) Comprehensive Chemometrics. Chemical and biochemical data analysis. Elsevier, Amsterdam
Google Scholar
Auf der Heyde TPE (1990) Analyzing chemical data in more than two dimensions: a tutorial on factor and cluster analysis. J Chem Educ 67:461–469
Article CAS Google Scholar
Harvey DT, Bowman A (1990) Factor analysis of multicomponent samples. J Chem Educ 67:470–472
Article CAS Google Scholar
Msimanga HZ, Charles MJ, Martin NW (1997) Simultaneous determination of aspirin, salicylamide, and caffeine in pain relievers by target factor analysis. J Chem Educ 74:1114–1117
Article Google Scholar
Rodríguez-Rodríguez C, Amigo JM, Coello J, Maspoch S (2007) An introduction to multivariate curve resolution-alternating least squares: spectrophotometric study of the acid–base equilibria of 8-hydroxyquinoline-5-sulfonic acid. J Chem Educ 84:1190–1192
Article Google Scholar
Gilbert MK, Luttrell RD, Stout D, Vogt F (2008) Introducing chemometrics to the analytical curriculum: combining theory and lab experience. J Chem Educ 85:135–137
Article CAS Google Scholar
Kumar K, Mishra AK (2015) Application of partial Least Square (PLS) analysis on fluorescence data of 8-Anilinonaphthalene-1-sulfonic acid, a polarity dye, for monitoring water adulteration in ethanol fuel. J Fluoresc 25:1055–1061
Article PubMed CAS Google Scholar
Kumar K (2018) Processing excitation-emission matrix fluorescence and Total synchronous fluorescence spectroscopy data sets with constraint randomised non-negative factor analysis: a novel fluorescence based analytical procedure to analyse the Multifluorophoric mixtures. J Fluoresc 28:1075–1092
Article PubMed CAS Google Scholar
Dramićanin T, Zeković I, Periša J, Dramićanin MD (2019) The parallel factor analysis of beer fluorescence. J Fluoresc 29:1103–1111
Article PubMed CAS Google Scholar
Kumar K (2019) Non-negative factor (NNF) assisted partial Least Square (PLS) analysis of excitation-emission matrix fluorescence spectroscopic data sets: automating the identification and quantification of Multifluorophoric mixtures. J Fluoresc 29:1183–1190
Article PubMed CAS Google Scholar
Malinowski ER (1978) Theory of error for target factor analysis with applications to mass spectrometry and nuclear magnetic resonance spectrometry. Anal Chim Acta 103:339–354
Article CAS Google Scholar
Malinowski ER (1989) Statistical f-tests for abstract factor analysis and target testing. J Chemom 3:49–60
Article Google Scholar
Gampp H, Maeder M, Meyer CJ, Zuberbühler AD (1985) Calculation of equilibrium constants from multiwavelength spectroscopic data - I: mathematical considerations. Talanta 32:95–101
Article PubMed CAS Google Scholar
Gampp H, Maeder M, Meyer CJ, Zuberbühler AD (1985) Calculation of equilibrium constants from multiwavelength spectroscopic data - III: model-free analysis of spectrophotometric and ESR titrations. Talanta 32:1133–1139
Article PubMed CAS Google Scholar
Gampp H, Maeder M, Meyer CJ, Zuberbühler AD (1987) Evolving factor analysis. Comment Inorg Chem 6:41–60
Article CAS Google Scholar
Warner IM, Christian GD, Davidson ER, Callis JB (1977) Analysis of multicomponent fluorescence data. Anal Chem 49:564–573
Article CAS Google Scholar
Ho CN, Christian GD, Davidson ER (1978) Application of the method of rank annihilation to quantitative analyses of multicomponent fluorescence data from the video fluorometer. Anal Chem 50:1108–1113
Article CAS Google Scholar
Lorber A (1985) Features of quantifying chemical composition from two-dimensional data array by the rank annihilation factor analysis method. Anal Chem 57:2395–2397
Article CAS Google Scholar
Sanchez E, Kowalski BR (1986) Generalized rank annihilation factor analysis. Anal Chem 58:496–499
Article CAS Google Scholar
Wilson BE, Sanchez E, Kowalski BR (1989) An improved algorithm for the generalized rank annihilation method. J Chemom 3:493–498
Article CAS Google Scholar
MATLAB v.R2015a (2015) Natick. The MathWorks Inc, Massachusetts
Google Scholar
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Google Scholar
Van Rossum G, Drake FL (2009) Python 3 reference manual. CreateSpace, Scotts Valley
Google Scholar
Lakowicz JR (2006) Principles of fluorescence spectroscopy. Springer, Boston
Book Google Scholar
Balzani V., Ceroni P., Juris A. (2014) Photochemistry and Photophysics: concepts, Research, Applications, Wiley-VCH, Weinheim
Parker CA (1968) Photoluminescence of solutions. Elsevier, Amsterdam
Google Scholar
Brereton RG (2016) Points, vectors, linear independence and some introductory linear algebra. J Chemom 30:358–360
Article CAS Google Scholar
Brereton RG (2017) Basic matrix algebra. J Chemom 31:e2833
Article CAS Google Scholar
Brereton RG (2016) Orthogonality, uncorrelatedness, and linear independence of vectors. J Chemom 30:564–566
Article CAS Google Scholar
Witek ŁJ, Turek AM (2017) A novel algorithm for resolution of three-component mixtures of fluorophores by fluorescence quenching. Chemom Intell Lab Syst 160:77–90
Article CAS Google Scholar
Kałka AJ, Turek AM (2018) Fast decomposition of three-component spectra of fluorescence quenching by white and grey methods of data modeling. J Fluoresc 28:615–632
Article PubMed PubMed Central CAS Google Scholar
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2:37–52
Article CAS Google Scholar
Malinowski ER (1977) Determination of the number of factors and the experimental error in a data matrix. Anal Chem 49:612–617
Article CAS Google Scholar
Grung B, Kvalheim OM (1995) Interactive rank annihilation: a graphic approach to quantification in grey multicomponent systems. Anal Chim Acta 316:225–232
Article CAS Google Scholar
Manne R, Shen H, Liang Y (1999) Subwindow factor analysis. Chemom Intell Lab Syst 45:171–176
Article CAS Google Scholar
Jaumot J, Gargallo R, de Juan A, Tauler R (2005) A graphical user-friendly interface for MCR-ALS: a new tool for multivariate curve resolution in MATLAB. Chemom Intell Lab Syst 76:101–110
Article CAS Google Scholar
Abdollahi H, Golshan A (2011) Rank annihilation factor analysis method for spectrophotometric study of second-order reaction kinetics. Anal Chim Acta 693:26–34
Article PubMed CAS Google Scholar
Abdollahi H, Nazari F (2003) Rank annihilation factor analysis for spectrophotometric study of complex formation equilibria. Anal Chim Acta 486:109–123
Article CAS Google Scholar
Saltiel J, Sears DF, Turek AM (2001) UV spectrum of the high energy conformer of 1,3-butadiene in the gas phase. J Phys Chem A 105:7569–7578
Article CAS Google Scholar
Tauler R, Marqués I, Casassas E (1998) Multivariate curve resolution applied to three-way trilinear data: study of a spectrofluorimetric acid–base titration of salicylic acid at three excitation wavelengths. J Chemom 12:55–75
Article CAS Google Scholar

Download references

Code Availability

Exemplary MATLAB codes are included in supplementary information files (Appendix C) for this article.

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Faculty of Chemistry, Jagiellonian University, 2 Gronostajowa St, 30 387, Cracow, Poland
Andrzej J. Kałka & Andrzej M. Turek

Authors

Andrzej J. Kałka
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej M. Turek
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design.

Corresponding author

Correspondence to Andrzej M. Turek.

Ethics declarations

Conflicts of Interest/Competing Interests

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

ESM 1

(PDF 879 KB)

ESM 2

(ZIP 654 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kałka, A.J., Turek, A.M. Do Spectra Live in the Matrix? A Brief Tutorial on Applications of Factor Analysis to Resolving Spectral Datasets of Mixtures. J Fluoresc 31, 1599–1616 (2021). https://doi.org/10.1007/s10895-021-02753-w

Download citation

Received: 15 February 2021
Accepted: 20 May 2021
Published: 06 August 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s10895-021-02753-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Do Spectra Live in the Matrix? A Brief Tutorial on Applications of Factor Analysis to Resolving Spectral Datasets of Mixtures

Abstract

Graphical abstract

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Inductively Coupled Plasma Optical Emission Spectrometry (ICP-OES): a Powerful Analytical Technique for Elemental Analysis

Evolutionary algorithms and their applications to engineering problems

Motivation

Theoretical Background

A Brief Characteristics of UV-vis Spectroscopy

Spectroscopic Data in Terms of Matrix Algebra

Singular Value Decomposition of a Data Matrix and its ‘Consequences’

Experimental Model System

A Practical Example of Factor Analysis Performed on Excitation-Emission Maps

How Many Components Are in a Mixture?

Which Substances May Be or Be Not Present in a Studied Mixture?

How Much of a Component Is in a Sample?

In Search of the Signal Selectivity

Sample as a ‘Black Box’

Factor Analysis in Physico-Chemical Studies

A Brief Summary

Data Availability

References

Code Availability

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of Interest/Competing Interests

Additional information

Publisher’s Note

Supplementary Information

ESM 1

ESM 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation