1 Introduction

In Raman spectroscopy vibrational spectra can be detected. Analysis of those spectra provides comprehension about chemical and physical properties of molecular structures, which is important in different research areas in biology, medicine and industry [1,2,3]. Nowadays, Raman spectrometers are capable of generating spectral recordings down to the femto second time scale. Such time-resolved Raman spectroscopy allows—besides spectral recordings of stable substances—for monitoring of events like intra molecular rearrangements and chemical reactions [4]. We thereby obtain measured Raman spectra as a function of time, which depicts both main characteristics of an observed process: On the one hand, each measured spectrum is a fingerprint of compounds and therefore represents the intrinsic spectra of the individual species or molecular states involved in the reaction. On the other hand, the relative contributions of the involved spectra to each measured spectrum reflect the momentary composition of the sample at the corresponding time. Through the full series of generated spectra we hence draw conclusions about the kinetics of the underlying reaction process. Consequently, the central task about time-resolved Raman data analysis is deciphering the series of measured spectra with respect to the individual component spectra and their temporal evolution.

This article is organized as follows. In Sect. 2, we give an overview of NMF approaches and algorithms known so far. In particular we present the separable NMF method, which found application in the approach for spectral analysis in [5]. Our new NMF approach, as well as the algorithmic details of the corresponding computational method, are introduced in Sect. 3. In Sect. 4, we present numerical results of our novel method. On the one hand, we thereby discuss recovery results for synthetic measurement data with increasing interference of the component spectra and presence of measurement noise. On the other hand, we verify the influence of the single components of our adaptable objective function through recovery results for certain choices of weighting coefficients.

2 Non-negative matrix factorization (NMF)

From a mathematical point of view the non-negative measurement matrix M, which contains the discretized time-resolved Raman spectra, can be expressed as

$$\begin{aligned} M \ = \ WH \qquad W\in {\mathbb {R}}_+^{n\times r}, H \in {\mathbb {R}}_+^{r\times m}, \end{aligned}$$
(1)

where the columns of W represent the component spectra and H the course of the relative concentrations. A factorization of M into the two matrices W and H is, from the chemical point of view interesting: the matrix W gives us the substances being involved in the reaction and the matrix H allows inference on the speed of the reaction. Note, that this is not possible by considering only one row or column of the matrix M. Summing up, time-resolved Raman spectral data can be modeled as the product of two non-negative matrices representing the single component spectra and the underlying reaction kinetics (Fig. 1).

Fig. 1
figure 1

Interpolated visualization of the measurement data matrix M. The matrix H represents the “normed” intensity which we term relative concentration. The matrix W represents the wavenumber

Recovering these factorization matrices, only given the measured time-resolved spectra, requires non-negative matrix factorization (NMF). In general, NMF is an utile tool for the analysis of high-dimensional data and therefore a relevant topic in present-day research in many scientific fields [6,7,8]. Besides detecting a compressed representation, NMF delivers insights into structure and features of the given data by extracting easily interpretable factors.

The goal of nonegative matrix factorization (NMF) (see e.g. [8, 9] and the references therein) of a data matrix M as input, is to solve an optimization problem in order to find matrices W and H with non-negative entries such that the product WH is the best possible approximation of our non-negative input data matrix M. NMF is a linear dimension reduction technique for a non-negative data set, which means that the corresponding matrix of data points is approximated by a linear combination of the columns of matrix W.

Mathematical background The columns of W form a basis for the column space of matrix M and the columns of matrix H are the weights to approximate the data points. The NMF problem is \(\mathcal NP\)-hard [10], due to the non-negative constraints on W and H. Moreover the solution of an NMF Problem is generally not unique. To see this, assume that \(W>0\), \(H>0\), and that there exists a matrix D such that \(WD>0\) and \(D^{-1}H >0\) then \(M=(WD)(D^{-1}H)\) which shows that the NMF is not unique.

In the absence of the positivity constraints, the problem could be solved efficiently by using methods such as truncated singular value decomposition (TSVD) [11]. One of the common approaches for solving the NMF problem is the alternating least squares approach [12, 13]. In this approach, one of the two matrices is fixed, for example H, and then we find the corresponding optimal solution for W, which is a convex optimization problem with non-negativity constraints. Then alternate between W and H. If the matrix M satisfies a separability condition, then we can solve the NMF problem efficiently. By definition a matrix M is r-separable if there exists a non-negative factorization (exact factorization) of rank r, where each column of W is equal to a column of M. Meaning that each column of W, being a basis for the column space of M, appears somewhere in the data matrix M as its column.

Geometrically, the columns of W are the vertices of the convex hull of the columns of M. The separability condition means that all columns of M can be reconstructed by using a convex combination of r columns of W [14, 15]. This is only possible if the columns of M form a simplex which is spanned by r columns of M. This is not necessarily the case.

NMF in the context of measurement data Given a component-wise non-negative matrix M of dimension \(n \times m\) and an integer \(r>0\), NMF determines likewise component-wise non-negative matrices W and H of dimensions \(n \times r\) and \(r \times m\), respectively, such that \(M=WH\). Generally, integer r is denoted as rank of the factorization. Assuming M to represent m measurements of n non-negative variables, we interpret the NMF task as follows: we aim to identify r ingredients which allow for recovery of all m measurements by composition according to respective contributions. The ingredients then are reflected by the columns of factorization matrix W, while the columns of H contain the corresponding mixing coefficients.

In practice, considering measured data, and therefore allowing noise or other forms of data uncertainty, generally rules out the existence of an exact NMF in terms of \(M=WH\). Thus, from now on, we want to compute component-wise non-negative matrices W and H such that WH is an approximation of M.

In the context of Raman data spectral analysis, focusing on the non-negativity of involved matrices becomes reasonable through the model for time-resolved Raman spectral data of Luce et al. [5]. They introduce an approach to express a series of spectral recordings of a chemical reaction (matrix M) as the matrix product of the component spectra (matrix W) and the evolution of relative concentrations of these reaction components (matrix H). Based on this model and synthetic spectral data, which satisfy the recently much-cited separability assumption, the authors of [5] furthermore present an algorithm to detect a factorization \(WH=M\) using separable NMF methods.

Inspired by their results, we propose a novel method, which does not rely on the separability assumption, since in the context of a spectral analysis this assumption is very restrictive. The separability assumption means that the convex hull of the columns of M is given by the column vectors of W. This is not necessarily given in real-world data. In other words, this assumption means, that the convex hull of M is a simplex. Of course it is true that we are searching for a simplex that includes all columns vectors of M, but the convex hull of M needs not be a simplex. Thus, we will exploit additional chemical or physical model aspects in order to find the optimal simplex including the columns of M without separability assumption. The purpose of this new approach is using adaptable objective function, taking into account only the common structural properties of the sought-for, process defining matrices W and H.

3 Solving an optimization problem for NMF

In the following we pick up the concepts of both previous chapters as we introduce a new NMF approach which is specialized on analysis of time-resolved Raman spectral data. Recall from (1) that the recovered non-negative matrices represent the component spectra of the involved species (W) and the reaction kinetics in terms of the evolution of relative concentrations (H). Our novel NMF approach differs from the methods discussed so far as it is mainly based on minimization of an objective function which directly incorporates all known structural properties of the sought-for matrices W and H. Furthermore, our approach is unaffected by the restrictive separability assumption. In contrast to Luce et al. [5], we apply our method even to non-separable measurement data. Additional flexibility and adaptability of the novel approach will be depicted in the numerical results in Sect. 4. Here we present the leading ideas of this approach as well as the details of the corresponding computational method.

3.1 Optimization criteria for NMF

In the following we propose a novel approach which is based on an objetive function which includes the needed structural properties of the sought-after matrices W and M.

Claims on the matrices W and H In the following we assume that the component spectra are positive, such that W is a positive matrix. The componentwise non-negativity of the kinetics H is also reasonable, since relative concentrations are, in general, non-negative. Furthermore, because of representing relative concentrations, each column of H is a priori supposed to sum up to 1.

For each of the s chemical species the relative concentration is given by the relative concentration function \(h_s\):

$$\begin{aligned} h_s : \left[ 0,T \right] \rightarrow \left[ 0,1 \right] , \qquad s = 1,\ldots , r. \end{aligned}$$

describing the relative concentration of species s at time \(t \in \left[ 0,T \right]\) of the considered reaction.

Since the concentrations \(h_s(t)\) are relative we have

$$\begin{aligned} \sum _{s=1}^r h_s(t) =1 \text { for each } t \in [0,T]. \end{aligned}$$

By using m time steps for discretization of the concentration functions \(h_s(t)\) we obtain the column stochastic matrix

$$\begin{aligned} H = \begin{bmatrix} h_1(t_0)&{} \dots &{}\dots &{} h_1(t_{m-1})\\ h_2(t_0)&{} \dots &{} \dots &{} h_2(t_{m-1}) \\ \vdots &{} \dots &{} \dots &{}\vdots \\ h_r(t_0)&{} \dots &{}\dots &{} h_r(t_{m-1}) \end{bmatrix}. \end{aligned}$$

The sequential Raman-measurements cannot be modelled as a “random picking of spectra”. The temporal order of measurements is important. Let the columns of H be given by \(h(t_i), i=0,\ldots ,m-1\), i.e.

$$\begin{aligned} H= [h(t_0)|\dots \dots |h(t_{m-1})],\quad h(t_i) \in {\mathbb {R}}^r, i=0,\ldots ,m-1. \end{aligned}$$

Given the initial “concentrations” \(h(t_{i-1})\) there is a kinetics (or some Markov process) providing the concentrations of the next time-step \(h(t_i)\). This can be modelled by assuming a transition matrix P for the autonomous Markov process, if the time intervals are always constant. Thus, we claim that there exists a (row) stochastic matrix \(P \in {\mathbb {R}}^{r\times r}\) such that

$$\begin{aligned} (h(t_{i-1}))^T \cdot P = (h(t_{i}))^T,\qquad i=0,\ldots ,m-1. \end{aligned}$$
(2)

In other words, the change of the relative concentration between the time steps can be interpreted as a Markov process. The construction of this matrix P will be explained later.

Summing up the objective function in our approach has the following penalty terms

  1. (i)

    W is component-wise non-negative,

  2. (ii)

    H is component-wise non-negative,

  3. (iii)

    H is column stochastic,

  4. (iv)

    P is component-wise non-negative, and

  5. (v)

    P is row stochastic.

Summing up, we arrive at the following objective function

$$\begin{aligned} \Psi= & {} \alpha \left( \min \limits _{i,j} \; {W}_{ij} \right) + \beta \left( \min \limits _{i,j} \; {H}_{ij} \right) +\gamma \left( \max \limits _j \; |\sum \limits _{i=1}^r \; {H}_{ij} -1| \right) \\&+ \delta \left( \min \limits _{i,j} \; {P}_{ij} \right) +\mu \left( \max \limits _i \; |\sum \limits _{j=1}^r \; {P}_{ij} -1| \right) \end{aligned}$$

It has to be mentioned here, that the constraint (iv) is not necessarily valid. The matrix P has to be row-stochastic, however, the entries of P can be negative. A Galerkin projection of a Markov Process on the basis of microstates to a small set of macrostates can lead to negative entries in the projected matrix P. In the real-world example in Sect. 4.3, we will show a crystallization process with a non-exponential decay of one species, which leads to a matrix P with one negative entry.

Robust Perron cluster analysis (PCCA+) In the computational method of our novel NMF approach, we apply the Robust Perron Cluster Analysis (PCCA+) [16] to generate an initialization of the kinetics in matrix H. We thus briefly introduce intention and operating principles of PCCA+ and reveal its utility for our context.

PCCA+ belongs to the family of algorithms for characterizing objects of similar behaviour to combine them into a certain number of clusters. In several areas of computational life science this kind of task plays a versatile role. PCCA+ arises from investigation of molecular conformation dynamics and identification of metastable conformations [17, 18]. There, metastable conformations are clusters for which the large scale geometric structure of the observed ensemble is conserved under the influence of a spatial transition operator [19]. Translating this approach into terms we consider a stochastic matrix \(T \in {\mathbb {R}}^{N\times N}\) (representing the discretized version of the spatial transition operator) and we search for a non-negative matrix \(Y \in {\mathbb {R}}^{N\times N_C}\), which column-wise contains the clusters \(y_i ,\; i=1, \dots ,N_C\), and thus satisfies three requirements: Y is non-negative and row stochastic, in order to meet the partition-of-unity constraint. Thirdly the vectors \(y_i\) build an eigenvalue cluster near 1.0 of T. This means for each \(i=1, \dots ,N_C\) we have

$$\begin{aligned} Ty_i \approx y_i . \end{aligned}$$
(3)

The main idea of PCCA+ is to generate Y as a linear transformation of the matrix \(X\in {\mathbb {R}}^{N\times N_C}\), where X columnwise contains the \(N_C\) first eigenvectors of T with respect to eigenvalues close to \(\lambda _1 = 1\). PCCA+ therefore computes a non-singular transformation matrix \({\mathcal {A}}\in {\mathbb {R}}^{N_C \times N_C}\) in order to gain the non-negative, row stochastic matrix Y via

$$\begin{aligned} Y = X{\mathcal {A}}. \end{aligned}$$
(4)

Above, in paragraph matrix properties, we claimed that the sought-for matrix H of reaction kinetics needs to be non-negative and column stochastic. Both requirements are satisfied if we consider (4) and choose \(H=Y^T\) as an initial guess of the kinetics. Thus, in the computational method of our novel NMF approach, the preprocessing prepares the application of PCCA+ in order to generate a promising initialization of H.

Solving for \({{\mathcal {A}}}\) (4), we may find several feasible solutions \({\mathcal {A}}\in {\mathbb {R}}^{N_C \times N_C}\) providing an appropriate matrix Y. PCCA+ tackles this issue by computing \({\mathcal {A}}\) through solving an optimization problem with respect to a certain objective function. Given that the stochastic matrix T is the discretization of a transition operator (consider e.g. molecular conformation dynamics), maximization of this objective function is equivalent to the maximization of metastability between the generated clusters. In other contexts (consider e.g. geometrical cluster problems) the interpretation of the objective functional may be different while still meaningful. See [17, 20, 21] for exemplary applications and illustrations of PCCA+ in several research areas.

3.2 Computational method

The main work stages in the computational method of our novel NMF approach are summarized in Algorithm 1. Note that we distinguish between the finally recovered matrices (denoted as \(W_{rec}\) and \(H_{rec}\)) and their corresponding interim results (denoted as \({\widetilde{W}}\) and \({\widetilde{H}}\)). Furthermore, we use matlab method pinv to calculate pseudoinverses of singular or even non-square matrices. We then label the pseudoinverse of a matrix A as \(A^{\dagger }\). Furthermore, with \(A_+\) we denote the matrix which is constructed out of A by deleting the first row and \(A_-\) is the corresponding matrix constructed out of A by deleting the last row.

figure a
  • Step 1: Preprocessing       In the preprocessing we consider \(M^T\). By subtraction of a reference point we transfer the columns of \(M^T\) into a linear space. Afterwards we perform singular value decomposition (SVD) such that we gain \(M^T = U\Sigma V^T\). In order to initialize \({\widetilde{H}}\) we want to apply PCCA+ to the leading \(r-1\) columns of U. Thus we build a matrix \({\mathcal {U}}\), which takes the role of X in (4), as follows: The first column of \({\mathcal {U}}\) is equal to \(e=\left[ 1, \ldots , 1 \right] ^T \in {\mathbb {R}}^m\), which is a requirement of PCCA+. We then stock up with columns \(1, \ldots , r-1\) of U until \({\mathcal {U}}\in {\mathbb {R}}^{m\times r}\). Subsequently, for efficiency reasons of PCCA+, we ensure orthogonality among the columns of \({\mathcal {U}}\) [16].

  • Step 2: Initializing \({\widetilde{H}}\), \({\widetilde{W}}\), and \({\widetilde{P}}\)       We apply PCCA+ to \({\mathcal {U}}\). According to (4), we obtain a non-negative, column stochastic matrix \({\widetilde{H}}\) setting

    $$\begin{aligned} {\widetilde{H}} = \left( {\mathcal {U}} {\mathcal {A}}\right) ^T \; \in {\mathbb {R}}^{r\times m}, \end{aligned}$$
    (5)

    whereby \({\mathcal {A}}\in {\mathbb {R}}^{r\times r}\) is the computed PCCA+ transformation matrix. \({\widetilde{H}}\) is our initial guess of the kinetics of relative concentrations. Accordingly, we gain an initialization of the component spectra \({\widetilde{W}}\) through the relation

    $$\begin{aligned} M&= {\widetilde{W}} {\widetilde{H}} \nonumber \\ \Leftrightarrow \qquad {\widetilde{W}}&= \ M {\widetilde{H}}^{\dagger } = M \left( {\mathcal {A}}^T {\mathcal {U}}^T \right) ^{\dagger } \; \in {\mathbb {R}}^{n\times r} . \end{aligned}$$
    (6)

    In (2), we can see that the matrix \({\widetilde{P}}\) is given by

    $$\begin{aligned} {\widetilde{P}}&=(({\widetilde{H}}_-)^T)^{\dagger }({\widetilde{H}}_+)^T \nonumber \\&= {{{\mathcal {A}}}}^{-1}\big ({{{\mathcal {U}}}}_-^\dagger {{{\mathcal {U}}}}_+\big ){{{\mathcal {A}}}}. \end{aligned}$$
    (7)

    Regarding (5), (6), and (7) we express the initial guesses of the sought-for matrices only in terms of the given and processed data (M, \({\mathcal {U}}\)) and the PCCA+ transformation matrix (\({\mathcal {A}}\)).

  • Step 3: Minimizing objective function       The objective function of our novel NMF approach only incorporates structural properties of the sought-for matrices, as discussed above in paragraph matrix properties. With respect to each property we estimate a penalty value as stated in the following expressions:

    $$\begin{aligned} \left. \begin{aligned} \text {Penalty 1:} \qquad&\alpha \left( \min \limits _{i,j} \; {\widetilde{W}}_{ij} \right) \qquad \qquad \\ \text {Penalty 2:} \qquad&\beta \left( \min \limits _{i,j} \; {\widetilde{H}}_{ij} \right) \qquad \qquad \\ \text {Penalty 3:} \qquad&\gamma \left( \max \limits _j \; |\sum \limits _{i=1}^r \; {\widetilde{H}}_{ij} -1| \right) \qquad \qquad \\ \text {Penalty 4:} \qquad&\delta \left( \min \limits _{i,j} \; {\widetilde{P}}_{ij} \right) \qquad \qquad \\ \text {Penalty 5:} \qquad&\mu \left( \max \limits _j \; |\sum \limits _{j=1}^r \; {\widetilde{P}}_{ij} -1| \right) \qquad \qquad \\ \end{aligned} \right\} \end{aligned}$$
    (8)

    In regard to non-negativity of light intensities and relative concentrations, penalties 1, 2, and 4 determine the smallest entries in matrices \({\widetilde{W}}\), \({\widetilde{H}}\), and \({\widetilde{P}}\). As the sum of penalty values is supposed to increase if these smallest entries appear to be negative, weighting coefficients \(\alpha\), \(\beta\), and \(\delta\) are generally chosen negative, too. For \({\widetilde{H}}\) to be column stochastic, the maximal deviation from its correct column sum is penalized in Penalty 3. Whereas, the requirement on \({\widetilde{P}}\) to be row stochastic is regarded by computing the maximal deviation of a column sum from being equal to 1.0 in Penalty 5.

    Consider \(\Psi\) to represent the sum of penalty values. As we choose the relations (5) and (6) for initialization, the input arguments for the objective function are the matrices M, \({\mathcal {U}}\) and \({\mathcal {A}}\). Since we perform optimization with respect to parameter \({\mathcal {A}}\), the minimization problem can be written in the form

    $$\begin{aligned} \min \limits _{{\mathcal {A}}\in {\mathbb {R}}^{r\times r}} \; \Psi ^2 . \end{aligned}$$

    Minimizing \(\Psi ^2\) hence numerically adjusts matrices \({\widetilde{W}}\) and \({\widetilde{H}}\) according to the claimed structural properties. For computation we apply matlab method fminsearch, which uses the simplex search method of Lagarias et al. [22].

  • Step 4: Recovering \(W_{rec}\), \(H_{rec}\), and \(P_{rec}\)       The minimization in Step 3 finally returns a transformation matrix \({\mathcal {A}}_{\text {opt}}\). We then recover the resulting kinetics \(P_{rec}\) of relative concentrations \(H_{rec}\) and the component spectra \(W_{rec}\) according to (5)–(7) as

    $$\begin{aligned} H_{rec}&= \left( {\mathcal {U}} {\mathcal {A}}_{\text {opt}} \right) ^T = {\mathcal {A}}_{\text {opt}}^T {\mathcal {U}}^T \; \in {\mathbb {R}}^{r\times m} , \\ W_{rec}&= M H_{rec}^{\dagger } = M \left( {\mathcal {A}}_{\text {opt}}^T {\mathcal {U}}^T \right) ^{\dagger } \; \in {\mathbb {R}}^{n\times r}, \\ P_{rec}&= {{{\mathcal {A}}}_{\text {opt}}}^{-1}\big ({{{\mathcal {U}}}}_-^\dagger \mathcal{U}_+\big ){{{\mathcal {A}}}_{\text {opt}}} \; \in {\mathbb {R}}^{r\times r}. \end{aligned}$$

In regard to NMF in the context of Raman data spectral analysis, our novel approach offers two main advancements: Firstly, in contrast to the method of Luce et al. [5], our novel NMF approach is unaffected by the separability assumption. Since we only consider the general properties of the sought-for matrices without further demands on the input data, we may apply the novel approach to the broader range of even non-separable spectral data. Secondly, note the possibility to manipulate the decicive objective function in Step 3 by the choice of weighting coefficients \(\alpha , \beta ,\gamma ,\delta\) and \(\mu\) or by addition of further penalty terms. This flexibility and adaptability of our method allows, for special focus on certain data properties or even extension of the recovery objectives. We remark that the approach of optimizing \(P_rec\) has already been suggested in [23] and recently (7) has been appiled in [24].

The next section presents some numerical experiments.

4 Numerical results

In this section we present the level of performance of our novel NMF approach by applying it to a sequence of artificial time-resolved Raman spectral data. After describing the reaction data generation in Sect. 4.1, we prove that the component spectra are recovered to a high quality and that we even reach meaningful approximations of the underlying reaction kinetics. As well in Sect. 4.2, we present the effectiveness of our method in the case of increased overlap among the individual component spectra and the occurrence of measurement noise. In Sect. 4.3, we present real-word data from Raman spectroscopy measured during a crystallization process of paracetamol in ethanol. We show that our method can help to identify and characterize intermediate states (and their life-times) of a chemical process.

4.1 Description of the reaction data generation

As in Sect. 2, for the model of time-resolved Raman spectral data, we here again follow the framework of Luce et al. [5].

Regarding the generation of artificial time-resolved Raman spectral data we consider a reaction scheme with five involved species A, B, C, D and E which are inter-related by first-order reactions. These first-order reactions are characterized by a rate matrix of transition coefficients as follows:

$$\begin{aligned} K \; = \; \begin{bmatrix} -\,0.53 &{} 0.53 &{} 0 &{} 0 &{} 0 \\ 0.02 &{} -\,0.66 &{} 0.43 &{} 0.21 &{} 0 \\ 0 &{} 0.25 &{} -\,0.36 &{} 0 &{} 0.11 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0.1 &{} 0 &{} -\,0.1 \end{bmatrix} \end{aligned}$$

The rows \(i =1, \dots ,5\) of K reflect the transition behaviour of the corresponding species in the course of the observed reaction. So \(K_{12}\) says that 53% of the amount of species A merge into species B per arbitrary unit of reciprocal time. The diagonal entries of K represent the sum of relative loss of each species per time unit. Thus we already notice species D to be the only product of this modeled reaction as just this species exclusively absorbs rates. Here, we let species A be the only educt of the reaction and therefore denote the initial concentration vector as \(h_0 :=h(t_0) =~\left[ 1, 0, 0, 0, 0 \right] ^T\). With \(h_0\) and rate matrix K we obtain the reaction kinetics as a function of time by

$$\begin{aligned} h(t)^T = \left[ h_1 (t), \dots , h_5 (t) \right] = h_0^T e^{Kt}, \end{aligned}$$

where \(h_i (t)\) denotes the relative concentration of species i at time t. The resulting kinetics are displayed in Fig. 2 (right). We gain the corresponding matrix H of kinetics by discretization of h(t) at equidistant time steps \(t_0, \dots , t_{m-1}\) such that \(H=~\left[ h(t_0), \dots , h(t_{m-1}) \right]\).

The single component spectra are built up as arbitrary sums of Lorentzians, which we illustrate in Fig. 2 (left). The five columns of matrix W accordingly contain the discretized intensity-by-wavenumber signals.

Fig. 2
figure 2

Illustration of artificially generated component spectra (left) and kinetics of first-order reactions (right) including five species A to E. The assignment of color to species holds for both panels. The resulting time-resolved measurement data are displayed in Fig. 3 (top)

The spectral overlap among the single component spectra is adjustable. This means we may increase the level of spectral interference by moving all base points \(x_0\) of the generated Lorentzians towards certain focal points. The level of spectral interference decides the level of separability of the measurement data. While the results in [5] are based on near-separability because of low spectral interference, we prove the effectiveness of our method even in the case of high interference among the component spectra.

The resulting measurement data matrix M is obtained as the product of matrix W of component spectra and matrix H of the underlying reaction kinetics as \(M=WH\). See Fig. 3 (top) for an interpolated visualizatoin of M.

4.2 Recovery results

Considering the measurement data, according to the artificial reaction scheme as introduced in the previous Sect. 4.1, our goal is now to recover the single component spectra as well as the reaction kinetics only given matrix M. In other words, we compute matrices \(W_{rec}\) and \(H_{rec}\) by applying our novel NMF approach to M. We thereby are especially interested in the reconstruction of the true component spectra W in order to provide a powerful tool for compound identification in real-life Raman spectral analysis. Recall that the objective function in our approach is based on adding up the penalty terms in (8), which represent the structural properties of the sought-for matrices and which are weighted by choice of the coefficients \(\alpha , \beta\) and \(\gamma\). In this section we present the results of our method for the predefinitions

$$\begin{aligned} \alpha = -0.0001, \; \; \beta = -1 \; \text { and } \; \gamma = 1. \end{aligned}$$
(9)

Recall additionally that we applied singular value decomposition in the preprocessing of our computational method. That is why the order of species in the recovered matrices \(W_{rec}\) and \(H_{rec}\) may be permuted in comparison to the order in the exact matrices W and H. For comparative visualization of our recovery results, we thus compute the correlation coefficients between the columns (\(\sim\) species) of \(W_{rec}\) and W and associate the spectra as well as the reaction kinetics according to the maximal correlation values.

Exemplary recovery results of our novel method for the noiseless case with low spectral interference are displayed in Fig. 4. Especially the recovery of components A, B and D is nearly exact: the coordinates as well as the heights of peaks, can hardly be distinguished visually from the original data. In the bottom right panel we also present the recovery result for the matrix H of reaction kinetics.

Fig. 3
figure 3

Interpolated visualization of the measurement data matrix M: on top, the case of well separation of the component spectra and no measurement noise. Below, a variant of increased spectral interference and noise contamination

Fig. 4
figure 4

Reconstructed component spectra of the single species and reaction kinetics (bottom right) for noiseless Raman data. The spectra of compounds A, B and D are recovered nearly exactly. Inaccuracies in the lower wavenumber regions occur for compounds C and E. Furthermore, our computed kinetics reflect the rough behaviour of the real kinetics

As in all upcoming illustrations of the reconstructed kinetics, the dotted lines are assigned to their species through the corresponding color in the spectral panels. For comparison, the exact kinetics (black lines) represent the kinetics from Fig. 2 (right). Indeed our reconstructed kinetics in Fig. 4 reflect the general trends of the exact kinetics as in particular species A is recognized to be the only educt, and species D to be the exclusive product of the generated reaction scheme.

As the first extension of the data setting, we now investigate the effectiveness of our method in the case of increased spectral interference. As mentioned in Sect. 4.1, we generate increased spectral interference among the component spectra in W by moving the base points \(x_0\) in all species towards three focal points. We then obtain component spectra as displayed in Fig. 5.

Fig. 5
figure 5

Component spectra for modest spectral interference. In comparison to the spectra in Fig. 2 (left), notice how the base points of the Lorentzians have been moved closer to each other

In Fig. 6 we present the results of our novel approach being applied to very interference-rich measurement data. Besides the remaining high quality in the recovery of components A, B and D, the reconstruction of species C and E apparently improved compared to the results in Fig. 4. In this interference-rich case our method computes the coordinates of the peaks in all component spectra quite satisfactorily. Concerning the recovery of the reaction kinetics, displayed in the bottom right panel, we again precisely identify the educt and the product of the reaction.

Fig. 6
figure 6

Reconstructed component spectra of the single species and reaction kinetics (bottom right) for the case of high spectral interference. Note the improvements in the recovery of species C and E in comparison to Fig. 4. In addition, the educt and the product of the reaction are clearly recognizable in the recovery of reaction kinetics

Fig. 7
figure 7

Reconstructed component spectra of the single species and reaction kinetics (bottom right) for interference-rich and noisy measurement data. The spectral recoveries still show a reasonable agreement with the true spectra. The main traits of the reaction kinetics are recognizable as well

As the second extension of our data setting we regard the recovery results of our routine additionally considering contamination of measurement noise. In any practical setting Raman spectral analysis needs to deal with this issue since, for instance, signal shot noise or background noise appear in any real experimental data. Here we assume the noise from all different sources to be adequately represented by additive Gaussian white noise, which disturbs the measurement matrix M according to

$$\begin{aligned} {\tilde{M}} = M + \delta \; \text {abs} \left( N \right) . \end{aligned}$$

The entries of N thereby are generated by the normal distribution \({\mathcal {N}} (0,1)\) and \(\delta =0.5\) is the relative noise level. See Fig. 3 (bottom) for an interpolated visualization of the interference-rich and noisy measurement matrix \({\tilde{M}}\). Applying our novel NMF approach with the predefinitions in (9) to \({\tilde{M}}\), the illustrations of results in Fig. 7 prove that the component spectra still show a reasonable agreement with the exact spectra. Furthermore, the main traits of the true reaction kinetics are recognizable in the recovered kinetics as well.

4.3 Example: paracetamol in ethanol

We took experimental time-resolved Raman spectroscopy data of paracetamol as an example to demonstrate application and usability of our NMF algorithm. Paracetamol crystallizes in different forms (paracetamol is a polymorph). The forms have different properties when processing the drugs in their final tablet formulation. The bioavailability of the drug can also be different according to a particular form [25]. Control over crystallization is, thus, required in an attempt to manufacture suitable tablets. It is important to study crystallization in an empirical manner with different solvents, cooling rate, etc. One important aspect is the choice of solvents. Different solvent choices yield different polymorphs of paracetamol [26]. Crystallization studies from liquid solutions were performed in a custom-made acoustic levitator [27], i.e., the droplet of the solution can be fixed in a stable and undisturbed position by means of an ultrasonic field. The acoustic levitator allows executing contact-free crystallization studies and in situ measurements. The environment around the sample can be controlled regarding the surface, temperature, and humidity by passing a cool/hot stream of nitrogen. During the experiment the solvent evaporates and leads to a gradual increase of the concentration of the droplet which finally crystallizes (Fig. 8). Time-resolved Raman spectroscopy is performed with the resolution of 3 s during this crystallization process. Various pathways from solution phase of the drug molecules to final crystallized phase have been suggested. An intermediate metastable polyamorphic state has been reported wherein the paracetamol molecules existing in transient disorganised cluster undergoes ordering to fetch final crystal structure of high order [28]. With our method, we were able to not only understand the kinetics of the intermediate phase, but were also able to calculate the spectra of the intermediate state. This data is crucial in understanding and thus controlling the crystallization of a drug substance. The measurements are shown in Fig. 9.

Fig. 8
figure 8

Paracetamol polymorph type I crystallizing in acoustically levitated droplet of its supersaturated solution in ethanol

Fig. 9
figure 9

In real-world applications, sequential measurements of Raman spectra lead to input data for NMF. The intensity of different wavenumbers is measured at different timesteps

Fig. 10
figure 10

During the crystallization, solvated paracetamol (black spectrum) passes through an intermediate amorphous state (red spectrum) which then immediately turns into a crystal structure (green spectrum). The three component spectra of this process are extracted by using NMF (Color figure online)

The following settings are used for the optimization function: \(\alpha =0.00001, \beta =100, \gamma =100, \delta =1, \mu =1\). With these settings it is focused on feasible concentrations. This means, we focus on providing a matrix \(H_{rec}\) with non-negative entries and column sum 1, such that Fig. 11 shows mathematically feasible concentration curves. \(\alpha\) is set to a very low value, because the intensities of the spectra are orders of magnitude higher than the entries in \(H_{rec}\) or \(P_{rec}\). After using the optimization approach Algorithm 1, especially the matrices \(H_{rec}\) and \(W_{rec}\) are important experimental findings. They show the spectra of intermediate steps and of the final crystal form of paracetamol (Fig. 10) and they show the kinetics of the crystallization process (Fig. 11). The matrix \(P_{rec}\) is:

$$\begin{aligned} P_{rec}= \begin{pmatrix} 1.00 &{} 0.00 &{} 0.00 \\ 0.02 &{} 0.98 &{} 0.00 \\ -\,0.01 &{} 0.02 &{} 0.99 \end{pmatrix}. \end{aligned}$$

This matrix represents the approximated Galerkin projection (3 states) of a transition process in a continuous space (micorscopic 3D arrangement of the atoms in the droplet). The third row of \(P_{rec}\) represents the initial state. The second row is the intermediate state. There is a zero probability for going back from this state to the initial state. The first row represents the stable final crystal. The upper right part of \(P_{rec}\) is zero. This is because the crystallization process is directed. Figure 11 shows a decay of the initial state which is nearly linear. In reaction kinetics we usually expect exponential decay. The matrix is just the optimal fit to a presumed kinetics according to the chosen objective function. Depending on the optimization criterion, one can obtain different results from NMF of the given raw Raman spectroscopy data. These results can be checked using a cross-validation method to confirm the mathematical interpretation of the chemical process. We compared the results of NMF with simultaneous time-lapse photography of the droplet, the first of its kind to be used as a watchdog for comparing results obtained from NMF that correspond to the experimental results. Besides comparing time-step of phase change point observed in concentration curves with the experimental time-steps, another factor that validates the results are the peaks reported for metastable intermediate amorphous state closely matches with our calculated spectra. The peaks in red curve, for measured intermediate state, 1236 cm−1,1326 cm−1,1618 cm−1 to refer to few of many, match with calculated peaks at 1235 cm−1, 1327 cm−1,1619 cm−1 [28]. Naturally, the peaks for final moieties can also be verified and are in accordance with reported experimental data. Structural changes, which are predicted with NMF are verified on the basis of this recording.

Fig. 11
figure 11

Using NMF, the three (\(r=3\)) different compenent spectra show up during the course of time with different relative weights (concentrations). The red curve indicates initial moieties, the red curve denotes intermediate moieties, and green curve is used to indicate final crystallized polymorph. The matrix \(H_{rec}\) includes the kinetics of the crystallization process

5 Conclusion

Summarizing, our novel NMF approach returns remarkable and robust results in the recovery of component spectra and reaction kinetics while the method is mainly based on the general structural properties of the sought-for matrices. The recovery results of our approach even indicate that the quality of the recovered component spectra improves as the spectral overlap among the component spectra increases. Our approach can therefore be considered as a complement to the method of Luce et al. [5] since the success of their method especially depends on low spectral interference (near-separability of M).