1 Introduction

In molecular spectroscopies, using the waves from the electromagnetic field spectrum, the fundamental problem is to determine the unknown chemical composition of the given substance either alone or mixed with some other species. One of the most versatile spectroscopies is based on nuclear magnetic resonance (NMR) [1,2,3,4,5,6,7]. In NMR spectroscopy, a specimen placed in a static magnetic field is resonantly excited by radiofrequency (RF) pulses. As a result, the sample generates a time signal or a free induction decay (FID), which is a coherent transient response of the system to the external perturbations.

The main physical characteristics of the FID are its fast oscillations and commensurate echoes that follow the RF pulses. First, the energy of the external electromagnetic waves is absorbed by the nuclear spin system of the molecules from the sample. The excited system, in its drive to the equilibrium, emits the surplus energy. Such a reaction of the sample induces a current in a coil surrounding the sample. The current is afterward converted into a digitized time signal, or FID.

Is it possible that this conceptually simple mechanism can inform about the detailed inner structure of a sample? Moreover, the related measuring devices at the dawn of NMR were simple and for this reason the technique was quickly adopted and implemented in many physics and chemistry laboratories across the world. The answer to this question is in the affirmative and, moreover, straightforward. The produced FID uniquely represents the time-domain representation of the absorptive and dispersive features of the specimen. These latter two characteristics of the sample are the fingerprints of its chemical composition.

Physically, the essence of NMR spectroscopy is in its time-resolved nature. However, this key feature is lost or smoothed out in the usual FID analysis by the customary Fourier transform techniques. The discrete and fast Fourier transforms (DFT, FFT) integrate the given FID. This integration smooths out the finer details of the FID and can provide only a rough average representation of the system response in the frequency domain. Mathematically, the Fourier transforms, as a linear mapping, is given by an \(N-\)degree single polynomial \(F_N,\) where N is the total signal length. The linearity brings all the intact noise from the input data (measured in the time domain) to the output data (computed in the frequency domain). In other words, noise from measurements is not mitigated by the conventional Fourier analysis.

The fast Padé transform (FPT) paves a different road [8,9,10,11,12,13]. It models the response of a resonantly excited system by a nonlinear digital filter. This yields a spectrum which represents a frequency-dependent response function of the system as the unique ratio of two \(K-\)degree polynomials \(P_K/Q_K.\) The nonlinearity of this quotient can suppress noise from the input FID. Moreover, the Padé nonlinear digital filter reshapes an incident perturbation waveform to reconstruct the input FID. This retrieval, in fact, simultaneously parametrizes both the spectrum and the FID in terms of the same basic characteristics, the K resonance frequencies \(\omega _k\) and the corresponding amplitudes \(d_k\) \((1\le k\le K)\), both complex-valued.

Here, K reflects three fundamental characteristics of the system by being the number of the complex harmonically oscillating damped exponentials, the number of the components in the total shape spectrum (envelope) and the number of the resonating molecules in the sample. The real and imaginary parts of \(\omega _k\) give the position and width of the \(k\,\)th resonance. The absolute value and phase of \(d_k\) gives the magnitude (intensity) and phase of the \(k\,\)th oscillation in the FID.

Algorithmically, the FPT first extracts the polynomials \(P_K\) and \(Q_K\) from a single system of linear equations using all or a subset of the N input FID data points. This initial step defines the nonparametric FPT. It gives only the envelope spectrum \(P_K/Q_K\) (no components) at any sweep frequency \(\omega \) (equidistant or not, the Fourier grid or not). The Padé spectrum can be computed by the fast Euclid algorithm with \(N(\textrm{log}_2N)^2\) multiplications comparable to the corresponding computational cost \(N\textrm{log}_2N\) in the FFT for \(N=2^{\ell }(\ell =1,2,3,\ldots )\).

To parametrize the studied system, rooting the denominator polynomial \(Q_K\) is performed to extract the K resonance frequencies \(\omega _k.\) The associated K amplitudes \(d_k\) are obtained from the Cauchy residue formula of the complex envelope \(P_K/Q_K\) taken at \(\omega =\omega _k\) \((1\le k\le K).\) This is the parametric FPT, which from the found resonance parameters \(\{\omega _k,d_k\}\) generates the \(k\,\)th component spectrum as the Heaviside partial fraction of \(P_K/Q_K.\) The sum of these K components or the partial fractions is the envelope \(P_K/Q_K\) in the parametric FPT. For the same K,  the envelopes predicted by the nonparametric and parametric FPT are identical.

The nonparametric and parametric FPT belong to the category of shape and parameter estimators, respectively, in the field of signal processing. A shape estimator gives only the form of a frequency-dependent lineshape profile with no quantitative information. A parameter estimation explicitly solves the quantification problem to find the peak parameters (peak position, width, height, phase). While the FFT can be only a shape estimator, the FPT can be employed for both shape and parameter estimations.

Although both the FFT and the nonparametric FPT supply only the envelopes, the latter processor is more advantageous. The reason is in the mathematical structure of the rational polynomial \(P_K/Q_K\) from the FPT compared to a single polynomial \(F_N\) in the FFT. The Padé spectrum \(P_K/Q_K\) has poles and zeros as the roots of \(P_K\) and \(Q_K,\) respectively. In the nonparametric FPT, neither poles nor zeros of \(P_K/Q_K\) are explicitly determined. Nevertheless, they are implicitly contained in the envelope \(P_K/Q_K\) and whenever they are spurious (unphysical, noisy) they cancel each other. This amounts to noise reduction. Functions (e.g. NMR spectra) whose frequency representations exhibit sharp lineshapes are described well by a rational polynomial \(P_K/Q_K.\) Poles of \(P_K/Q_K\) yield spectral lines (resonances, peaks). Consequently, the FPT can use fewer FID data points to give e.g. an accurate frequency spectrum representation of the sample.

Overall, for the same input FID, the envelope from the nonparametric FPT is less noisy that its counterpart in the FFT which, as stated, cannot suppress noise from the processed time signal. The classical Fourier analysis deals with a slowly converging series which is an expansion of a function in terms of the unattenuated sinusoidal and cosinusoidal oscillations. This series (just like the mentioned single polynomial \(F_N\)) can have only zeros. The absence of poles therein forces the usage a large number of expansion coefficients in the Fourier series to mimic the sharp spectral lines. The drawback of lacking the polar structure is directly reflected in the poorer resolution of the FFT relative to the FPT. Consequently, using the same time signal length N,  the resolving power of the FPT spectrum is doubled compared to the FFT.

This is due to the fact that the Padé series obtained from \(P_K/Q_K\) by expanding \(1/Q_K\) reproduces twice more exact input coefficients (time signal points) than in the Fourier analysis. The FFT is not a predictive model. By exhausting the given subset of the input N data points from the FID, this processor cannot give any information about the remaining (ignored) time signal points. In fact, if e.g. one half of the given FID is used, the FFT considers the other half as zeros. The FPT is a predictive model. It extrapolates the employed subset of the FID beyond the number of the used times signal points. Extrapolation amounts to prediction. A part of the extrapolated time signal coincides exactly with the unused remainder of the FID.

Traditional shape estimation cannot parametrize the sample, as stated. However, this restriction applies only to nonderivative signal processings. Such a limitation is surmounted by the nonparametric derivative fast Padé transform (dFPT), which is a shape estimator [14,15,16,17,18,19,20,21,22,23,24,25]. The \(m\,\)th derivative with respect to frequency applied to the nonparametrically generated envelope \(P_K/Q_K\) simultaneously improves resolution and signal/noise ratio (SNR). This comes as a result of linewidth narrowings and background baseline flattenings in the nonparametric dFPT spectrum. For certain m,  the spectral lines in an envelope from the nonparametric dFPT become so sharp that they collapse onto the components from the parametric dFPT. Reconstructing the exact components from a nonparametric envelope amounts to performing quantification. As such, the peak signature (position, width, height) can be extracted from sharp single magnitude-mode spectral lines in the input envelope generated by the nonparametric dFPT. The dFPT as a shape estimator thereby becomes a quantifier.

These main features are illustrated in the present study with the nonparametric and parametric variants of the nonderivative FPT and derivative dFPT in the context of a cancer medicine diagnostic problem associated with the usage of proton NMR spectroscopy (alternatively called proton magnetic resonance spectroscopy as denoted by \(^1\)H MRS). Such processors are applied to time signals encoded by high-field proton NMR spectroscopy from two patients with ovarian lesions, one benign and the other malignant. The samples used for FID encodings are from ovarian cyst fluid. One of our goals is to focus on the spectral region with the citrate molecules. Using both the benign and malignant samples, we address the prospect of considering the citrate quartet as a cancer biomarker. In other words, we want to see whether the citrate metabolites could help in the differential diagnosis (benign versus malignant) for the ovarian cyst fluid so as to distinguish between the non-cancerous and cancerous lesions. The citrates are among the recognized cancer biomarkers for the prostate (low in cancerous prostate, high in normal prostate and in benign prostatic hypertrophy). To expand the diagnostic window in differential diagnosis, it would be important to establish the citrate metabolites as cancer biomarkers for the ovary, as well.

2 Theory

Throughout this study, we employ the diagonal fast Padé transform, FPT. Specifically, this is the \(\textrm{FPT}^{(-)}\) variant with the harmonic variable \(z^{-1}=\textrm{e}^{-i\omega \tau }.\) For simplicity, notation \(\textrm{FPT}^{(-)}\) is shortened as FPT, since the alternative version \(\textrm{FPT}^{(+)}\) with variable \(z^{+1}=\textrm{e}^{+i\omega \tau }\) will not be used. Here, \(\omega \) is the angular frequency with its usual relation to the linear frequency \(\nu \) as \(\omega =2\pi \nu .\) Further, \(\tau \) is the sampling time of the FID, defined by \(\tau =N/T.\) Here, N is the total signal length and T is the total acquisition time, or equivalently, the total duration of the FID.

For the given complex time signal \(\{c_n\}\, (0\le n\le N-1),\) the exact Green function (the response function, the envelope spectrum) is introduced by the single MacLaurin polynomial, also known as the finite \(z-\)transform:

$$\begin{aligned} S(z^{-1})=\sum _{n=0}^{N-1}c_nz^{-n}. \end{aligned}$$
(2.1)

The subscript n in the expansion coefficient \(c_n\) is the time signal number counted as \(n\tau ,\) which shows that the FID is equidistantly sampled.

In the diagonal FPT, the input complex envelope \(S(z^{-1})\) is modeled by the unique ratio of two complex polynomials \(P_K\) and \(Q_K\) of degree K in the same variable \(z^{-1}:\)

$$\begin{aligned} S(z^{-1})=\frac{P_K(z^{-1})}{Q_K(z^{-1})}. \end{aligned}$$
(2.2)

The numerator \(P_K\) and denominator \(Q_K\) polynomials are defined in terms of their expansion coefficients \(\{p_r\}\) and \(\{q_s\},\) respectively:

$$\begin{aligned} P_K(z^{-1})=\sum _{r=0}^Kp_rz^{-r},\quad Q_K(z^{-1})=\sum _{s=0}^Kq_sz^{-s}. \end{aligned}$$
(2.3)

The relation (2.2) defines two systems of linear equations, one for \(\{p_r\}\) and the other for \(\{q_s\}.\) However, once the coefficients \(\{q_s\}\) are obtained and refined by the singular value decomposition (SVD), the system for \(\{p_r\}\) need not be solved numerically since it becomes an analytical expression, the convolution of \(\{q_s\}\) and \(\{c_n\}.\) This completes the computation for the envelope \(P_K/Q_K\) in the nonparametric FPT.

The starting point of the parametric nonderivative FPT is the just obtained envelope \(P_K/Q_K\) in the nonparametric nonderivative FPT. The only supplementary numerical work in the parametric nonderivative FPT is rooting the denominator polynomial, \(Q_K=0,\) which gives the fundamental harmonics \(z^{-1}_k=\textrm{e}^{-i\omega _k\tau }\, (1\le k\le K)\) and, hence the resonance frequencies \(\nu _k=\omega _k/(2\pi ).\) This is the only nonlinear operation which is exactly linearized by casting it into the equivalent eigenvalue problem of the Hessenberg (companion) matrix. The companion matrix, defined in terms of the expansion coefficients \(\{q_s\}\) of \(Q_K,\) is extremely sparse (filled with zeros everywhere except with \(\{q_s\}\) on the first row and unity on the main diagonal). Thus, its eigenvalue problem can be solved for any (small or large) value of degree K in \(Q_K.\)

The strengths of the eigenharmonics \(\{z^{-1}_k\}\) are obtained by determining the behavior of the system response function \(P_K/Q_K\) in the vicinity of its singularities. The Padé spectrum \(P_K/Q_K\) is a meromorphic function since its only singularities are the K poles \(\{z^{-1}_k\}\) of \(P_K/Q_K.\) Because of this feature, the roots of \(P_K\) and \(Q_K\) are the zeros and poles of \(P_K/Q_K.\) The behavior of \(P_K/Q_K\) around its poles \(\{z^{-1}_k\}\) is prescribed by the Cauchy calculus for functions of complex variables. The residue theorem from this prescription gives the strength of \(z^{-1}_k,\) which is simultaneously the complex amplitude \(d_k\) of the \(k\,\)th harmonic in each FID data point \(c_n.\) Thus, each \(d_k\) is an analytical formula given by the Cauchy residue of \(P_K/Q_K\) taken at \(z^{-1}=z^{-1}_k.\) For a nondegenerate spectrum (no two eigenroots of \(Q_K\) are the same), this expression is:

$$\begin{aligned} d_k=\frac{P_K(z^{-1}_k)}{Q'_K(z^{-1}_k)},\,\,Q'_K(z^{-1})=\frac{\text{ d }}{\text{ d }z^{-1}}Q_K(z^{-1}), \,\,Q'_K(z^{-1}_k)\ne 0. \end{aligned}$$
(2.4)

The \(m\,\)th derivative fast Padé transform, dFPT, is obtained by applying the operator \(\textrm{D}_m=({\text{ d }}/{\text{ d }}\nu )^m\) to the nonparametric or parametric nonderivative FPT. The explicit expressions are given in Refs. [24, 25] and need not be repeated here.

In Sect. 3 both the envelopes and components from the nonderivative FPT \((m=0)\) and derivative dFPT \((m\ge 1)\) will be presented. The envelopes and components are from the nonparametric and parametric Padé-based estimations, respectively. In the illustrations by way of graphs, these spectra in the magnitude mode will be denoted by \(\left| \textrm{FPT}\right| _\textrm{Tot,Comp}\) and \(\left| \textrm{D}_m\textrm{FPT}\right| _\textrm{Tot,Comp}\) according to:

$$\begin{aligned} \left| \textrm{FPT}\right| _\textrm{Tot}\equiv & {} \left| \frac{P_K(z^{-1})}{Q_K(z^{-1})}\right| ,\,\,\left| \textrm{D}_m\textrm{FPT}\right| _\textrm{Tot}\equiv \left| \textrm{D}_m\frac{P_K(z^{-1})}{Q_K(z^{-1})}\right| , \end{aligned}$$
(2.5)
$$\begin{aligned} \quad \left| \textrm{FPT}\right| _\textrm{Comp}\equiv & {} \left| \frac{d_kz^{-1}}{z^{-1}-z^{-1}_k}\right| ,\,\,\left| \textrm{D}_m\textrm{FPT}\right| _\textrm{Comp}\equiv \left| \textrm{D}_m\frac{d_kz^{-1}}{z^{-1}-z^{-1}_k}\right| . \end{aligned}$$
(2.6)

3 Results and discussion

3.1 Time signals from measurements

Encoding of the presently employed time signals or FIDs has been performed by the authors of Ref. [26] with in vitro \(\mathrm{{}^1H\,MRS}.\) Their study included forty patients with ovarian tumors, twenty eight with benign and twelve with malignant lesions. These diagnoses were made by an independent experienced gynecological pathologist who performed the histopathological classifications of the examined ovarian cysts. The task of Ref. [26] consisted of determining whether such diagnoses could be correlated to the data from in vitro \({}^1\mathrm{H\,MRS}.\)

To this end, a Bruker 600 MHz (\(B_0\approx \)14.1T) spectrometer for in vitro \({}^1\mathrm{H\,MRS}\) was utilized to encode time signals from the said samples of the ovarian cyst fluid. The excised specimens were bathed in the deuterium oxide (\(\mathrm{D_2O}\)) solvent. With this solvent, resonance frequencies of metabolites are scaled upward [26]. For every patient, encoding encompassed some 128 FIDs. These FIDs are averaged for each patient. The result is an average FID per patient. The averaged FID has an improved SNR relative to any of the 128 encoded time signals. Without this time signal averaging, it would be practically impossible to extract some meaningful information, even on a qualitative level, by processing any of the separately encoded FIDs for the given sample. It is the averaged FID which is used for reconstructing the frequency spectra.

The dominant water content in the original FIDs was reduced/presaturated in the process of encoding [26]. Still, such a suppression left the significant water residuals that produced marked spectral distortions. The acquisition parameters from Ref. [26] were: the full FID length \(N=32768,\) the bandwidth \(\mathrm{BW=6667\, Hz},\) the sampling time \(\tau =1/\mathrm{BW\approx \,0.15\, ms},\) the repetition time \(T_\textrm{R}=1200\,\textrm{ms}\) and the echo times \(T_\textrm{E}=30\) and 136 ms. One zero filling was applied in the time domain to artificially extend the FID length to \(N=65536.\)

As an internal reference, trimethylsilyl-2-2-3-3-tetradeuteropropionic (TSP) acid (sodium salt) was added to each sample. The TSP molecules served a twofold calibration purpose. Firstly, the resonance frequencies of all the metabolites are counted relative to the position of the TSP peak (0 ppm), where ppm denotes the dimensionless frequency unit, parts per million. Secondly, the metabolite concentrations are determined as the percentage of the TSP concentration. Each concentration included the number of the pertinent protons contributing to the given resonance and was expressed in the biophysical units (micro-mol per gram, \(\mu \)M/g) per weight of the cyst fluid. The TSP concentration is deduced from the TSP molecular weight, the known amount of the TSP substance added to the given sample and the TSP peak area.

As such, by post-processing the FFT spectra, the authors of Ref. [26] were able to assess only 12 metabolite concentrations from the approximate peak areas and the reference TSP concentration. Post-processing of the complex-valued FFT spectra included three steps. First, the real part of the given complex FFT envelope is phase-corrected. Second, the background baseline is manually adjusted. Third, the resulting Fourier envelope was fitted to a sum of the absorptive Lorentzian lineshapes. The phase and background modifications of the original real part of the complex FFT envelope were done to approximately mimic the sought positive-definite absorptive-type lineshape profile.

Both such modifications are rough and prone to errors. Multiplication of a complex envelope by a single phase factor cannot make the resulting spectrum positive-definite without a considerable lifting of the background baseline. Such a forcefully elevated background can be ’manually’ corrected in a very approximate manner, but there will always be some remaining unphysical baseline. Moreover, even a seemingly isolated peak can be structured and this would overestimate the true peak area as well as the corresponding concentration.

In Ref. [26], the fitted Lorentzians were all assumed to have the same widths (approximately assessed to be about 1.0 Hz). However, the more refined signal processing, applying the Padé-based estimations to the mentioned FIDs, showed that the lineshape widths vary from one resonance to another across the entire Nyquist range [24, 25]. Inaccuracies in the resonance widths impact adversely on the peak heights. This inevitably yields the unreliable peak areas and concentrations.

Importantly, it was reported in Refs. [24,25,26] that the concentrations of several metabolites are considerably elevated in malignant compared to benign specimens of ovarian cyst fluid. The salient examples of such metabolites are isoleucine (Iso), valine (Val), threonine (Thr), alanine (Ala), lactate (Lac), lysine (Lys), methionine (Met), glutamine (Gln) and total choline (tCho). These biomolecules are among the participants to different paths of ovarian cyst metabolism. Hence, their unequivocal identification and quantification using \(\mathrm{{}^1H\,MRS}\) could provide crucial information about the metabolic state of ovarian cysts and the surrounding tissues/biofluids.

The presence of large molecules (proteins, ...) in biofluids usually obscures visualization of many smaller constituents, including e.g. amino acids (isoleucine, leucine, lysine, valine). Yet, in Ref. [26], these amino acids have been visualized/quantified. This was made possible with a deproteinization procedure. Such a technique involved centrifugation of the cyst fluid (during sample preparation, prior to FID encoding) using a 10 kD filter. Still, in the ’deproteinized’ specimens, all the proteins with molecular masses below this 10 kD cut-off value could perturb/obscure the lineshape profiles of the neighboring lighter metabolites.

Two averaged time signals were kindly given to us by the authors of Ref. [26]. These FIDs are for the ovarian cyst fluid from two patients, one diagnosed with serous cystadenoma (benign) and the other with serous cystadenocarcinoma (malignant). Each of these FIDs is of length \(N=16384\) and encoding was made with \(T_\textrm{E}=30\,\textrm{ms}\) (the other acquisition parameters are the same as those already mentioned for Ref. [26]). We subsequently made zero-fillings to 65536 so as to have the same zero-padded FID length as in Ref. [26].

3.2 Spectra from the Padé-based computations

We will analyze the spectra computed with the averaged FIDs corresponding to both the benign and malignant samples. Signal processings will be carried out by the nonderivative and derivative estimations in the FPT and dFPT, respectively. The nonparametric and parametric variants of the FPT and dFPT will be employed. The main focus will be on the nonparametric dFPT, which is to be validated here by the parametric dFPT. Of course, prior to obtaining the spectra in the dFPT, the seed spectra in the FPT must be generated. It is the nonderivative spectrum in the FPT that is subjected to the \(m\,\)th derivative transform \(\textrm{D}_m=({\text{ d }}/{\text{ d }}\nu )^m\) to arrive at the derivative spectrum in the dFPT \((m\ge 1).\)

All the reconstructed spectra will be displayed in the phase-insensitive magnitude mode, which is by definition positive-definite. As in the related earlier studies [24, 25], the model order \(K=3000\) is taken for the nonparametric and parametric FPT. By implication, all the spectra in the nonparametric and parametric dFPT also refer to the same \(K=3000.\) This does not mean that any of the computed Padé spectra will exhibit 3000 distinct resonances. Quite the contrary, the number of visible resonances is much smaller than 3000. The reason is that for any fixed K,  some of the K reconstructed resonances are stable (genuine, physical) while the others are unstable (spurious, unphysical) when the spectra are recomputed for several model orders around the selected K.

The Padé processing is self-guarded against the spurious (noisy) reconstructions. This is secured by the signal-noise separation (SNS). The SNS concept of the SNR improvement relies upon the fact that the same spuriousness is contained in both the numerator \(P_K\) and denominator \(Q_K\) polynomials from the nonderivative FPT spectrum \(P_K/Q_K\). The outcome is the appearance of the so-called Froissart doublets (pairs of noisy poles and zeros). Consequently, any spuriousness would be washed out from the quotient spectrum \(P_K/Q_K\) by the powerful mechanism of pole-zero cancellations. In other words, the noisy poles (the roots of \(Q_K\)) obliterate the noisy zeros (the roots of \(P_K\)). With some \(K_\textrm{s}\) spurious resonances eliminated from \(P_K/Q_K,\) the final noise-cleaned spectrum \(P_{K_\textrm{g}}/Q_{K_\textrm{g}}\) is comprised mainly of the \(K_\textrm{g}\) genuine resonances \((K_\textrm{g}=K-K_\textrm{s}).\)

Three frequency bands will be investigated, one wider (0.870-5.125 ppm) and two narrower (2.80-3.05, 4.65-4.75 ppm). The wider window 0.870-5.125 ppm (benign, malignant) contains a myriad of metabolites resonating at frequencies between and around the doublet and quartet of lactate, Lac(d) and Lac(q), that are located at 1.41 and 4.36 ppm, respectively. With the two narrower intervals, the focus is on the citrate quartet (2.80-3.05 ppm, benign, malignant) and the water residual (4.65-4.75 ppm, benign).

Previously [24, 25], the wider window was segmented into several sub-regions with the main emphasis on the two recognized cancer biomarkers, lactates and cholines, including their environments. Therein, the lactate and choline abundance levels were found to be notably higher in the malignant than in the benign samples. In the current work, we intend to peer into the citrate quartet for the benign and malignant cyst fluid samples to see whether this metabolite could also be a cancer biomarker for the ovary as is already the case for e.g. the prostate [27,28,29,30,31,32,33].

Fig. 1
figure 1

Proton MRS for ovarian cyst fluid (benign, malignant). Spectral intensities (ordinates) in arbitrary units, au. Chemical shifts (abscissae) in parts per million, ppm. Nonderivative (a, c, d) and first derivative (b, e) magnitude envelopes in the nonparametric FPT and dFPT, respectively, at 0.870-5.125 ppm. Samples: malignant (a, b, d, e) and benign (c). The top graphs on panels (a, b) are scaled upward by 7000 and 17500 au, respectively. Hereafter, dMRS denotes derivative magnetic resonance spectroscopy. For details, see the main text (color online)

In the upcoming illustrations through four figures, the envelopes and components from shape and parameter estimations by the FPT and dFPT are shown. Figures 1 (0.870-5.125 ppm) and 4 (2.80-3.05 ppm) are on both the benign and malignant samples, whereas Figs. 2 and 3 (4.65-4.75 ppm) are on the benign specimen alone. In all the four figures, we give the nonparametric estimations as the main focus. The parametric estimations are on the figures dealing with the two narrower bands of chemical shifts, 2.80-3.05 and 4.65-4.75 ppm. All the plots display the spectral intensities in arbitrary units (au) on ordinates as a function of dimensionless sweep frequencies or chemical shifts in ppm on abscissae. Both the original and reduced scales on the ordinates are used. In the spirit of sequential visualization/quantification, the reduced scales allow some of the weaker resonances to emerge from their former obscurity in the plots with the full dynamic range on ordinates.

In Fig. 1 (shape estimations throughout), the envelopes for the malignant samples are on panels (a,b,d,e). Within the same figure, the envelopes for the benign sample are on panel (c). The nonderivative and first derivative envelopes are on panels (a,c,d) and (b,e), respectively. The FPT as a shape estimator gives only the envelopes. When such envelopes or total shape spectra are subjected to the operator \(\textrm{D}_m\, (m\ge 1),\) within e.g. the nonparametric dFPT, the resulting spectra exhibit better resolution and SNR. Nevertheless, in general, such derivative spectra are still within the envelope realm of the lineshape profiles. In other words, the derivative envelopes may, in principle, differ from the components (partial shape spectra) in the parametric dFPT. The actual extent of such differences cannot be known prior to computations of spectra. This discrepancy can be determined by seeing how reliably the derivative envelopes from the nonparametric dFPT describe the individual resonances assignable to the metabolites that are physically present in the sample. For such a validity benchmarking of the nonparametric dFPT, the components from the parametric dFPT come to the rescue.

The wider frequency band 0.870-5.125 ppm of Fig. 1 hosts all the recommended segments around the chemical shifts 1.5, 1.7, 2.8, 3.0, 3.2 and 3.3 ppm [26, 34]. The recommendations from Refs. [26, 34] on these particular resonance frequencies were based on the anticipation that, besides the recognized cancer biomarkers (lactates, cholines, ...), some other metabolites might also be established as potential cancer biomarkers. This would be helpful for enlarging the possibility for improved differential diagnoses (benign versus malignant). Indeed, as stated, it was shown in Refs. [24,25,26] that e.g. isoleucine, valine, threonine, alanine, lysine, methionine and glutamine could be added to lactates and cholines to form an extended list of cancer biomarkers. We will expand on this issue in Fig. 4 to see whether also the citrate metabolite could supplement the existing set of cancer biomarkers for the ovary.

First shown in Fig. 1, through panels (a) and (b), are the envelopes from the nonderivative FPT and the first derivative dFPT in the surrounding of Lac(d) and Lac(q) for the malignant sample at the sub-bands 0.87-2.75 and 2.840-5.125 ppm, respectively. The two tallest resonances in the nonderivative FPT spectra are Lac(d) and Lac(q) with their peak heights at about 6041 and 1597 au, respectively. In order to enhance visibility, each of these two panels is partitioned in the same way. The bottom and the top parts of panels (a) and (b) are for Lac(d) and Lac(q) within the bands 0.87-2.75 and 2.84-5.182 ppm on the abscissae, respectively. Both plots on panel (a) and (b) refer to the same nontruncated ordinates, i.e. with the full dynamic intensity ranges therein. However, for better transparency, the ordinates for the top parts are scaled upward by 7000 and 17500 au on panels (a) and (b) for the nonderivative FPT and first derivative dFPT, respectively.

Some notable differences are evident from comparison of the nonderivative FPT (a) and the first derivative dFPT (b). Due to a large dynamic range of the resonance intensities, not much can be discerned besides the dominant peaks within the lactates, foremost the Lac(d) resonance near 1.41 ppm and then the Lac(q) resonance at 4.36 ppm. The wider bump of the water residual is seen only for the nonderivative FPT (a). This bulky spectral structure practically disappeared from the view for the first derivative dFPT (b), where the label \(\mathrm{H_2O}\) is formally retained merely to indicate the expected location of the water residual. As a result, the upper part of Fig. 1b contains Lac(q) and basically nothing else recognizable in the surrounding of the lactate quartet.

On the other hand, the neighborhoods of Lac(d) from the lower parts of Fig. 1, have also undergone some marked alterations when passing from the envelopes in the FPT (a) to the dFPT (b). The most pronounced changes are in the bases of the Lac(d) resonance, which is visibly more localized in the dFPT (b) than in the FPT (a). Such an improvement is due to a noticeable flattening of the background baseline in the dFPT (b) relative to that in the FPT (a). This yields an improved SNR in the dFPT (b) with respect to the FPT (a). Moreover, the peak height of Lac(d) is about 2.5 taller in the dFPT (b) than in the FPT (a). The implication is that the peak width of Lac(d) is by about 2.5 times narrower in the dFPT (b) compared to the FPT (a). The linewidth narrowing signifies improved resolution. Thus, both resolution and SNR are enhanced in the first derivative dFPT (b) relative to the nonderivative FPT (a). It then follows from Fig. 1 that the two main fingerprints of the successful performance of the dFPT (simultaneously improved resolution and SNR) are manifested already in the nonparametrically reconstructed first derivative envelope.

The envelope in the nonderivative FPT for the benign sample on panel (c) of Fig. 1 is also shown with the full dynamic range of the intensities. Herein, the spectral background throughout 0.870-5.125 ppm is relatively low. Yet, particularly the wide bottoms of Lac(d) and \(\mathrm{H_2O}\) lead to a notable lifting of the background baseline which, in turn, perturbs the neighboring resonances. As discussed, only 3 labeled metabolites for the malignant specimen are seen in the envelope from the FPT (a). By contrast, the envelope for the benign sample (c) has many visible resonances and some of these are provisionally assigned to at least 26 metabolites. Such a sizable metabolite assembly occurs in spite of the shown full dynamic range of the spectral intensities on the ordinate in panel (c). The reason is in the weaker Lac(d) and Lac(q) resonances that attain their peak heights at about 326 and 88 au, respectively. Note, that the full names of all the metabolites associated with the acronyms shown in Fig. 1 can be found in Refs. [24, 25].

In other words, the Lac(d) and Lac(q) resonances in the nonderivative envelopes are respectively by about 18.53 and 18.15 times stronger for the malignant (a) compared to the benign (c) samples. As stated, in the envelope for the malignant case (a), the Lac(d) and Lac(q) peaks are the two strongest resonances. This is not true for the envelope in the benign case (c) where, within e.g. the band 1.49-4.75 ppm, a number of resonances (Ala, Suc, Cit, Cr, Tau, m-Ins, Gly, \(\mathrm{H_2O}\)) are either comparable or stronger than the Lac(q) peak. Moreover, the second strongest resonance for the benign sample is the citrate quartet, Cit(q), with the average peak height attaining about 2/3 of the peak height of the dominant Lac(d) metabolite on panel (c). In the frequency sub-band 0.85-1.61 ppm around Lac(d) for the benign case (c), several resonances are visibly delineated, e.g. Leu(d,t), Iso(d,t), Val(dd), \(\beta -\)HB(d), Thr(d) and Ala(d). In contradistinction, none of these were appreciable for the malignant case (a).

However, this metabolite paucity for the malignant sample (a) in the FPT envelope is only apparent because it is mainly due to the usage of the nontruncated ordinate. The situation can change to the better by resorting to a sequential visualization, as suggested in Refs. [24, 25]. With such an option, once the dominant Lac(d) and Lac(q) peaks have been mastered/visualized with the sufficient initial information, the remaining resonances can be allowed to pop up in various ways. This can be done by e.g. appropriately truncating the given ordinate or dividing the small part of the lineshape around Lac(d) with a factor (say 25) or by using two different scales for the ordinates, one for Lac(d) and the other for the remaining resonances [24, 25].

The first mentioned possibility is illustrated on panel (d: malignant) for the nonderivative FPT. Therein, the ordinate is truncated at 400 au to match the same dynamic range of the nontruncated ordinate from panel (c: benign). The device of sequential visualization on panel (d) uncovers many resonances that were hidden on panel (a). Even by a cursory look at panel (d), one can at once identify at least 36 resonances (and/or resonance complexes), provisionally assigned therein to certain known and unknown metabolites. However, such a visual identification also unravels the troubles with the nonderivative estimations. This is observed in the marked spectral distortions caused by the extended tails of Lac(d), Lac(q) and \(\mathrm{H_2O}.\)

These deformations are reflected in the elevated background baseline on which the surrounding weaker resonances ride with their skewed lineshape profiles. Even between Lac(d) and Lac(q), the bottoms or bases of most resonances reside on some smaller bumpy hills. The obtained spectral configuration is further complicated since some bumpy hills are populated by resonance clusters/multiplets, e.g. Lys(m), Gln(m), His(m) and Pro(m). The enumerated and other spectral obstacles, invariably encountered with both in vitro and in vivo MRS, are shared by all the existing conventional nonderivative processings applied to FIDs encoded at any of the currently available magnetic field strengths (ranging from 1.5 to 14.1T) on spectrometers or clinical scanners alike.

Sequential visualization facilitates the emergence of many spectral details as if panel (d) were a zoomed version of panel (a). However, both panels (a) and (d) actually span the same frequency window 0.870-5.125 ppm. A better transparency in panel (d) than in (a) is due to employing the truncated versus the nontruncated ordinates, respectively. Clearly, sequential visualization by itself cannot deconvolve the amalgamated spectral structures, nor can it autonomously split apart the overlapped resonances. However, sequential visualization has its two practical purposes. One is to highlight the regions where some unfoldings of the convolved complexes and separations of tightly packed peaks could be of potential diagnostic interest. The other is to expose some of the main limitations of nonderivative estimations. The idea behind such a double goal is to eventually pave the road for some spectral ameliorations, where they are diagnostically needed the most.

Panel (d) accomplishes this twofold task of the first stage in sequential visualization. However, to go one step further and, possibly, materialize the anticipated idea of resolution and SNR improvements, it helps to understand why the overlapped resonances appear in the first place? There are two reasons, mathematical and physical.

\(\bullet \) Mathematical: A frequency spectrum is an integral transform over time of an FID. Integration is a smoothing operator. It smears out the finer details and intertwines the individual harmonics corresponding to different metabolites in the sample. Such smoothing/intertwining in the time domain generates one or more overlapped resonances in the frequency domain.

\(\bullet \) Physical: Biomedical samples from various tissues or body fluids (urine, blood,...), organ biopsies, cell suspensions or glandular secretions are abundant with metabolites resonating at their very close spin-spin (transversal) relaxation times \(T^\star _2\) due to having similar reaction rates in various metabolic pathways. A pair of any two or more adjacent peaks with similar values of \(T^\star _2\) appears practically as a single resonance since the linewidth is proportional to \(1/T^\star _2.\)

In particular, the said mathematical cause for the overlapping resonances indeed hints on the feasibility for their separations. To this end, all that is, in fact, needed is to mitigate the smoothing effect of the integration operator. It is the derivative operator, which can partially undo the smoothing action of integration. The derivative operator \(\textrm{D}_m=({\text{ d }}/{\text{ d }}\nu )^m,\) applied to a frequency-dependent envelope spectrum, is very sensitive to any slope change in the lineshape profiles. As a consequence, e.g. a dip (a node, a minimum) can appear between two tightly overlapped peaks that prior to the \(\textrm{D}_m\) action appeared as if they were a single resonance. This amounts to resolving the overlapped resonances. By preserving the positions of e.g. any two resonances, that are closely glued together, they can be split apart by reducing their widths, in which case the corresponding heights would automatically be increased. Such a tandem effect is precisely the result of the action of the derivative operator \(\textrm{D}_m\) on an envelope spectrum. The net outcome is then resolution and SNR enhancement.

This can be appreciated from the envelope in the first derivative dFPT for the malignant sample on panel (e) of Fig. 1. Comparing the two Padé envelopes, the nonderivative (d) and the first derivative (e), a notable gain in the latter spectrum is obvious. The gain is precisely in the sought simultaneous improvement of resolution and SNR. This is put in plain view by the narrower peak widths and the flatter background baseline in panel (e) relative to (d). The sequential visualization is also exploited in panel (e) by scaling down the ordinate. Herein, the maximal ordinate is 700 au compared to 400 au on panel (d). The ordinate on panel (e) is not reduced to 400 au to match its companion on panel (d). The reason is in preferring to show all the resonances close to Lac(d) with their full intensities, beginning with the strongest Lac(d) neighbor, Ala(d) near 1.51 ppm. Clearly, for the ordinate on panel (e) maximized at 700 au, the top parts of the Lac(d) and Lac(q) resonances had to be cut off.

The most pronounced width narrowing and background flattening is seen on panel (e) around Lac(d), Lac(q) and \(\mathrm{H_2O}.\) This is all the more remarkable by reference to panel (d) for the nonderivative envelope. On panel (d), the bottoms parts of the Lac(d), Lac(q) and \(\mathrm{H_2O}\) lineshapes are wide and significantly elevated above the chemical shift axis. Moreover, panel (d) shows that the tails of Lac(d), Lac(q) and \(\mathrm{H_2O}\) extend notably away from their peak centers. This leads to considerable deformations of the neighboring resonances whose concentration levels become erroneously increased on panel (d).

Such undesirable outcomes of the nonderivative envelope (d) are mitigated to a large extent by the first derivative envelope (e). Thus, panel (e) gives much more delineated individual resonances throughout the whole displayed band, 0.870-5.125 ppm. In particular, the background level is enormously lowered on panel (e) with respect to (d), especially around Lac(d), Lac(q) and \(\mathrm{H_2O}.\) As a consequence, the nearby resonances become completely visible as best seen with the lineshapes of amino acids Iso(d,t), Leu(d,t), Val(dd), \(\beta -\)HB(d), Thr(d) and Ala(d) within the band 0.9-1.6 ppm around Lac(d). Moreover, with the reduced bumpy hills on panel (e), the spectral visibility is also enhanced in other frequency bands, e.g. at 1.67-1.78 ppm for lysine, 2.42-2.52 ppm for glutamine, 3.18-3.87 ppm for glucose, etc.

Thus far, the citrate molecules are mentioned merely in passing. For the time being, it suffices to notice only that the citrate resonance appears prominently on panels (c-e) as a quartet, Cit(q). It was already announced that special attention will be devoted on Fig. 4 to the citrate metabolite and, to keep the perspective of the upcoming exposition, this is also signaled by the red highlighting around Cit(q) on panels (c-e).

The analysis of the results from Fig. 1 can be summarized as follows:

\(\bullet \) (i) Markedly narrower bottom parts of Lac(d), Lac(q) and \(\mathrm{H_2O}\) for \(m=1\) (e) than for \(m=0\) (d).

\(\bullet \) (ii) A notably better resolution for \(m=1\) (e) than for \(m=0\) (d), as optimally exemplified by the sub-band 0.87-1.52 near Lac(d).

These accomplishments by the nonparametric first derivative dFPT can be understood by the arguments that run as follows:

\(\bullet \) (i’) Linewidth narrowing in a derivative envelope widens the intensity gap between the sharper and broader resonances. With the augmented derivative order m,  the pace of the peak height growth becomes faster for thinner than for wider lineshapes. As such, in the dFPT, the long tails of resonances are effectively cut off by the action of the \(\textrm{D}_m\, (m\ge 1)\) operator. This “wave-packet-type” localization yields the well-tightened resonances. The ensuing effect is also manifested on panel (e) in marked narrowing of the bottom parts of spectral peaks, most prominently for Lac(d), Lac(q) and \(\mathrm{H_2O},\) as indicated in (i). In the MRS literature using encoded FIDs, the computed spectral envelopes are invariably contaminated by some rolling background baselines. These are comprised of noise, broad bumpy hills (stemming from heavy macromolecules, e.g. lipids, proteins, ...) and long tails of the water and other stronger resonances. The background is vastly diminished in derivative spectra, as also seen on panel (e).

\(\bullet \) (ii’) There is more to the linewidth narrowing phenomenon, which is the key feature of derivative estimations. Namely, the narrowed linewidths yield the overlap splittings. This is conveyed by panel (e) with an improved delineation of the Lac(d) environment (1.0-1.52 ppm) comprised of several doublets, e.g. Ala(d), Thr(d), \(\beta -\)HB(d) and Val(dd), as illuminated in (ii).

The magnitude mode of the \(m\,\)th derivative complex Lorentzian frequency spectrum exhibits much narrower lineshapes than the associated nonderivative profile \((m=0)\) [14]. Simultaneously, the derivative peak heights are considerably taller than their nonderivative counterparts. The narrower the peak width, the taller the peak height. Such a causal relationship is dictated by the definition of the peak height, as the quotient of the FID intensity \(|d_k|\) and the peak width, which is proportional to the imaginary part of the resonance frequency, \(\textrm{Im}(\nu _k)\) [14, 15].

An improved delineation and splitting of overlapping resonances translates directly into the increased frequency resolution. This also implies a stronger content of physical resonances and at the same time a weaker noisy background baseline. The result is a higher SNR. Such a twofold accomplishment proves that already the first derivative dFPT (e) clearly outperforms the nonderivative FPT (d) within the two main stumbling blocks of MRS, resolution and SNR. This outcome from the analysis of the results given in Fig. 1 represents a driving motivation to apply a higher derivative orders \((m\ge 2)\) within the dFPT as has previously been elaborated in Refs. [24, 25] for the frequency sub-bands containing the diagnostically most relevant metabolites.

One of the very important parts of signal processing using the samples of body fluids and tissues is linked to treatments of the water residual, which is currently positioned near 4.71 ppm (Fig. 1), as was also the case in Refs. [24, 25]. Because of the dominance of water in human biofluids and tissues, the unsuppressed water peak in spectra is huge and it dominates over all the other resonance intensities by a factor of the order of 10000. As an attempt to make the other metabolites at least partially visible, water is regularly suppressed during the process of encodings of FIDs. In the case of the presently analyzed time signals, this has also been done in Ref. [26] for both the benign and malignant specimens. Nevertheless, after water suppression, the residual structure around 4.71 ppm is still appreciable as seen in Fig. 1 on panels (a: malignant) and (c: benign) even with the nontruncated ordinates. For instance, on panel (c), the water residual resonance cluster is more intense than Lac(q), as stated earlier. Moreover, with the exception of Tau(t), Cit(q), Suc(s) and Lac(d), it is observed on panel (c) that the water remnant is stronger than all the other resonances at 0.870-5.125 ppm.

Consequently, the extended tail of the water residual on panel (c) hampers the analysis of the other resonances. It is, therefore, very important to develop a reliable treatment of such an intense water residual. The customary procedure in the MRS literature is to fit the water residual to about ten arbitrary Lorentzian lineshapes. Subsequently, the extracted adjustable parameters are used to build a spectrum which is subtracted from the Fourier envelope around the water residual. However, this practice is unreliable because it may eliminate some of the diagnostically relevant resonances lying within the water residual location. Moreover, such a procedure invariably modifies the estimated concentrations of the metabolites from the vicinity of the water residual.

A fundamentally different procedure has recently been proposed in Ref. [25] to treat the water residual with no arbitrariness whatsoever. The novel approach uses derivative processing by the nonparametric dFPT which is for the benign sample presently illustrated in Figs. 2 and 3 (for the malignant sample, see Ref. [25]). Only shape estimations are shown in Fig. 2 (\(0\le m\le 10).\) However, Fig. 3 deals with both shape and parameter estimations (\(0\le m\le 4).\)

Fig. 2
figure 2

Proton MRS for ovarian benign cyst fluid. Spectral intensities (ordinates) in arbitrary units, au. Chemical shifts (abscissae) in parts per million, ppm. Nonparametric nonderivative and derivative magnitude envelopes in the FPT and dFPT, respectively, for the water residual and its surrounding at 4.65-4.75 ppm. Nonderivative lineshape: (a, \(m=0\)). Derivative lineshapes: (b, \(m=1\)), (c, \(m=2\)), (d, \(m=3\)), (e, \(m=4\)), (f, \(m=6\)), (g, \(m=8\)) and (h, \(m=10\)). For details, see the main text (color online)

In Fig. 2, first depicted is the nonderivative envelope (a, \(m=0\)). Then come the nonparametric derivative envelopes (b, \(m=1\)), (c, \(m=2\)) (d, \(m=3\)), (e, \(m=4\)), (f, \(m=6\)), (g, \(m=8\)) and (h, \(m=10\)). In the nonderivative envelope (a), a sharp water residual peak appears near 4.71 ppm, surrounded by two very broad and strong resonances close to 4.690 and 4.715 ppm, presumably with some invisible structures. The water peak on panel (a) is not dominant at 4.65-4.75 ppm. The envelope (a) is superimposed on a quite elevated background, which attains about 1/3 of the water peak height. Already the first derivative envelope (b) is able to peer into some of the hidden structures by clearly delineating the four peaks besides the water singlet at 4.71 ppm, which now becomes the dominant resonance in the band 4.65-4.75 ppm.

The dip near 4.705 ppm from panel (a) between the water peak and the wide resonance at 4.69 ppm is transformed into a narrow and sharp peak on panel (b). Moreover, the background baseline on panel (b) is hugely reduced. Therein, it is this reduction (coupled with simultaneous shrinking and tightening of the two large spectral bumps at 4.69 and 4.715 ppm) that allowed the mentioned four smaller resonances to distinctly pop up. The wide tallest resonance on panel (a) is enormously squashed on panel (b) due to its splitting into the two weaker peaks near 4.68 and 4.69 ppm. The widths of these two latter peaks seem to be wider than those of the remaining three resonances on panel (b).

The second derivative envelope (c), lying on a flat and straightened background, shows a different pace of the peak height growths of the five resonances when passing from \(m=1\) (b) to \(m=2\) (c). The pattern seen here reflects the mantra of derivative processing “the wider the peak, the slower the peak height rise with the increased derivative order m”. On account of this major feature of the dFPT, the wider peaks (4.68 and 4.69 ppm) on panel (b, \(m=1\)) become the smallest visible resonances on panel (c, \(m=2\)). Moreover, the two central, tallest peaks (water and its shoulder) are further narrowed on panel (c) and completely separated from each other all the way down to the chemical shift axis. The width narrowing is manifested in the concomitant peak height enhancement, as can be monitored for e.g. the water resonance: 120 (a, \(m=0\)), 250 (b, \(m=1\)) and 1500 (c, \(m=2\)).

A similar trend of breadth narrowing and height increasing is gradually continued on panels (d-h) for the higher derivative orders \(3\le m\le 10.\) In the magnitude envelope, there exists interference between the real and imaginary parts of the complex lineshape. With the augmented derivative order m,  the impact of this interference effect on the overall lineshape may change in a nonuniform manner. Consequently, some oscillatory patterns may characterize the behavior of the spectral profiles when m is increased. This is also seen in Fig. 2 by following the evolution of the two central resonances (4.705, 4.71 ppm). As stated, these resonances are wholly split apart down to the zero-valued background baseline on panel (c, \(m=2\)). However, they are not entirely separated down to the chemical shift axis on panels (d, \(m=3\)) and (e, \(m=4\)). It takes the higher derivative orders \(6\le m\le 10\) on panels (f-h) for the dip between these two peaks (4.705, 4.71 ppm) to completely descend to the chemical shift axis.

The discussed two wider peaks (4.690, 4.715 ppm) on panel (a) are further stepwise weakened for \(m=3\) (d), \(m=4\) (e) and \(m=6\) (f). They are almost invisible for \(m=8\) (g) and, eventually, disappearing altogether from the view for \(m=10\) (h). This again confirms the concept of the slower peak height increase for broader than for sharper resonances. However, the wider resonances that vanish from the display for some derivative orders are still present within the low-lying background baseline (or as being immersed straight into the chemical shift axis) and can be pulled out from therein by e.g. reducing the size of the ordinate in the spirit of mentioned sequential visualization.

For \(m=0\) (a), the wide peak at 4.715 ppm has a small sharp shoulder near 7.30 ppm. This broad peak was smashed into the background already for \(m=1\) (b) and then completely disappeared thereafter on panels (c-h, \(m=2{-}10\)). The only leftover of the wide peak on panels (b-h) is its former shoulder, which systematically developed into a well-isolated, sharp peak for \(1\le m\le 10.\) The remaining three thin peaks on the right column of Fig. 2 are unequivocally separated and also amenable to reliable quantification. Their peak areas can be computed exactly by any standard numerical quadrature rule because the pertinent integration boundaries can unambiguously be determined as best done from panel (h). Out of the three peaks that ’survived’ the action of the derivative operator \(\textrm{D}_m\, (1\le m\le 10),\) only the water resonances is marked with its chemical formula. The other two resonances can be assigned to the nitrogen acetyl aspartate (NAA) metabolite. These resonances are physical (i.e. their corresponding metabolites are genuinely present in the benign sample) due to their stable existence in the spectrum for the increased derivative order m.

It then follows that the nonparametric derivative dFPT for the illustrated narrow frequency band 4.65-4.75 ppm, containing the water residual, is capable of simultaneously enhancing resolution and SNR:

\(\bullet \) Resolution improvement. The water residual compound is sharply resolved into its five main components. One of these constituents is the resonance for the water molecule itself at 4.71 ppm. The remaining four peaks can be assigned to some other metabolites (presently assumed to be from the NAA multiplet).

\(\bullet \) SNR improvement. The background baseline panel for \(m=0\) (a) is notably high as it matches about one third of the dominant peak located at 4.69 ppm. Such a strong background masks most of the hidden structure in the nonderivative envelope (a). However, this obstacle is swiftly weakened already in the first two derivative envelopes (b: \(m=1\), c: \(m=2\)) and completely removed for the higher derivatives (d-h, \(3\le m\le 10).\)

What matters the most in the expounded derivative shape estimations is not only visualization of the hidden resonances, but also their quantification. This new quantification does not even address, let alone solve, the explicit traditional spectral analysis problem (the quantification problem, i.e. there is no rooting of the characteristic or secular polynomials, etc). Moreover, quantification by derivative shape estimation is not limited to determining solely the mentioned peak areas of the isolated resonances. Quite the contrary, for these resonances, the peak positions, widths and heights can be found, as well. However, all such findings are valid under the provision that the apparently ’isolated’ resonances are not structured. One of the procedures to check the fulfillment of this condition is to keep on increasing the derivative order m. This kind of verification is performed in Fig. 2 up to \(m=10,\) at which point it became self-evident that only the three main components persisted with no further splitting. They can be perfectly quantified.

Nevertheless, to validate this procedure of shape estimations, we resort to the exact parametric processings. For this purpose, the parametric variants of the FPT and dFPT are employed with \(0\le m\le 4.\) The corresponding reconstructions are given in Fig. 3. They are compared with the nonparametric versions of the FPT and dFPT for \(0\le m\le 4.\) In this figure, the left (a-d) and the right (e-h) columns are allocated to the nonparametric and parametric Padé estimations, respectively. The nonderivative envelopes for \(m=0\) are on the first row (a, e). This is followed by the derivative envelopes for \(m=1, 3\) and 4 on the 2nd (b, f), 3rd (c, g) and 4th (d, h) rows, respectively.

Fig. 3
figure 3

Proton MRS for ovarian benign cyst fluid. Spectral intensities (ordinates) in arbitrary units, au. Chemical shifts (abscissae) in parts per million, ppm. Nonderivative and derivative magnitude spectra in the FPT and dFPT, respectively, for the water residual and its surrounding at 4.65-4.75 ppm. Envelopes, the nonparametric FPT and dFPT: left column (a-d). Components, the parametric FPT and dFPT: right column (e-h). Nonderivative spectra: \(m=0\) (a, e). Derivative spectra: \(m=1\) (b, f), \(m=3\) (c, g) and \(m=4\) (d, h). For details, see the main text (color online)

As just discussed, the Padé shape estimations must apply an additional operation, a derivative transform, to uncover some of the true individual resonances hidden in the given envelope. This supplementary mathematical procedure is needed because the nonparametric nonderivative envelope is an amalgamate of its invisible constituents. By contrast, the Padé parameter estimation, already in its nonderivative variant (\(m=0\)), provides all the true components contained in the same envelope. This stems from solving (explicitly and exactly) the quantification problem to first reconstruct all the peak parameters (position, width, height, phase) from the input time signals and then to generate the components (whose sum yields the envelope). It is this parametric envelope which is verified to be identical to the nonparametric envelope (both in the nonderivative FPT).

Thus, for \(m=0\) in Fig. 3, the opaque content of the envelope (a) becomes transparent through the corresponding components (e). About a dozen components are visible on panel (e), all of which are partially or completely intertwined. Therein, the three very wide and intense resonances fully overlay several thinner and weaker resonances. The overall transparency is further diminished on panel (e) by the additional three wide resonances of the lower intensities that cover the four smaller peaks. On panel (e), the water residual co-resonates with the middle widest resonance (both peaks being centered at the same chemical shift, 4.71 ppm).

The two wide peaks (4.69, 4.715 ppm) on panel (e), surrounding the water remnant, correspond to the two bulky resonances near \(\mathrm{H_2O}\) on panel (a). A destructive interference of the components (e) yields a clear delineation of the water peak on panel (a). In reality, it is not the magnitude-mode components that interfere, but rather their real and imaginary parts. The elevated background in the envelope (a) can now be understood to be due to interference (constructive, destructive) of the real and imaginary parts of the wide components (e). Of course, the components (e) themselves are not explicitly built into the nonparametrically reconstructed envelope (a). Nevertheless, they are implicitly present on panel (a) since, as stated, the nonparametric and parametric envelopes from the nonderivative FPT do always coincide with each other.

The first derivative components (f, \(m=1\)) are sparser mainly because the low-lying three wide resonances merged with the background baseline. The three wide and tallest components (e, \(m=0\)) are drastically diminished on panel (f, \(m=1\)). The result is a clearer appearance of the four smaller and narrower components on panel (f). This configuration of components (f) is also quite clearly reflected in the corresponding envelope (b). In the third derivative components (g, \(m=3\)), the leftovers of the three wide resonances completely vanished into the chemical shift axis. The remaining five components (g) are mirrored in the envelope (c). This is further reinforced by the remarkable concordance between the fourth derivative components (h) and envelope (d) for \(m=4\).

Moreover, we checked that the higher derivatives \(5\le m\le 10\) (not shown in Fig. 3, to avoid clutter) of the components and envelope are in full coherence. In particular, the tenth derivative components are found to be identical to the corresponding envelope from Fig. 2h. In other words, the latter envelope for \(m=10\) from the nonparametric dFPT collapses uniquely to the associated exact components from the parametric dFPT. Such a reconstruction of the true components from the input nonparametric envelope amounts to exact quantification by shape estimation alone.

The chief goal in addressing the water residual by the explained manner in Figs. 2 and 3 is to maximally localize this initial wide conglomerate. The performed analysis shows that the derivative shape estimations by the nonparametric dFPT can successfully solve this problem by dramatically narrowing the broad bottom or base of the water residual and its underlying constituents. This is possible by effectively cutting off all the long tails within the zoomed band 4.65-4.75 ppm, which besides the thin water singlet also contains a number of other resonances (wide and narrow alike). Finally, in this tight band, even after the 10th derivative of the nonparametric dFPT (Fig. 2h), three sharp resonances persist in the display in which the water peak dominates. These remaining resonances can be quantified exactly. Afterward, the ordinate can be reduced to visualize and then likewise quantify the remaining resonances hidden beneath the original water residual. The small frequency band 4.65-4.75 ppm is chosen as an illustrative example, but precisely the same procedure of coupling the sequential visualization and quantification can be employed for any other frequency interval within the entire Nyquist range.

As a case in point, the next frequency band to consider in the analysis is another narrow region, 2.80-3.05 ppm. This interval, depicted in Fig. 4, contains the citrate quartet, Cit(q). Since our main interest here is in Cit(q), the other resonances seen in Fig. 4 are left unmarked, but they can be assigned to the NAA multiplet, NAA(m). Figure 4 shows the Padé spectra for the benign and malignant samples. These spectra are reconstructed by the nonparametric and parametric estimations using the nonderivative and derivative processings. In Fig. 4, the envelopes and components from the nonparametric and parametric computations are on the left and right columns, respectively. The results for the benign sample (a: envelope, b: components) refer only to the nonderivative FPT \((m=0\)). The reconstructions for the malignant sample on panels (c-h) are for the nonderivative and derivative envelopes and components \((0\le m\le 2).\) For this case, the envelopes are on panels (c, \(m=0\)), (e, \(m=1\)) as well as (g, \(m=2\)), whereas the components are on panels (d, \(m=0\)), (f, \(m=1\)) as well as (h, \(m=2\)).

Fig. 4
figure 4

Proton MRS for ovarian cyst fluid (benign, malignant). Spectral intensities (ordinates) in arbitrary units, au. Chemical shifts (abscissae) in parts per million, ppm. Nonderivative and derivative magnitude spectra in the FPT and dFPT, respectively, for the citrate quartet and its surrounding at 2.80-3.05 ppm. Envelopes, the nonparametric FPT and dFPT: left column (a, c, e, g). Components, the parametric FPT and dFPT: right column (b, d, f, h). Samples: benign (a, b) and malignant (c-h). Nonderivative spectra: \(m=0\) (a-d). Derivative spectra: \(m=1\) (e, f) and \(m=2\) (g, h). For details, see the main text (color online)

In the nonderivative envelope (a: benign), the citrate quartet very clearly exhibits its recognizable structure. The four peaks in Cit(q) are symmetric. The inter-peak separations are the same in the two pairs of resonances on the left and right hand sides of the constellation. The two outer peaks are lower than the two inner peaks, as expected from the MRS theory. Yet, no quantification is possible from this spectral configuration (a, \(m=0\)) because of the overlapping bottoms of the individual resonances in the two pairs of the peaks in Cit(q). Moreover, it is impossible to know from panel (a) whether either some or all of the four peaks are single or structured resonances. As a matter of fact, they have some hidden peaks, as seen from panel (b, \(m=0\)) for the corresponding components in the parametric FPT. Herein, beneath each of the four peaks from Cit(q), there is a well-delineated, bell-shaped symmetric resonance profile. Therefore, for the nonparametric quantitative reconstructions to be acceptable, it is necessary to apply the derivative shape estimations by the dFPT (\(m\ge 1\)) in an attempt to gradually overcome the insufficient resolution and SNR of the nonderivative FPT (a, \(m=0\)).

For the malignant sample in the nonderivative case (\(m=0\)), the envelope (c) and components (d) show the two visibly different patterns. Herein, the spectral configurations are much more involved than that on panels (a, b). In the envelope (c) and components (d), a high spectral complexity is seen throughout the band 2.80-3.05 ppm. There is a heavy distortion of the envelope (c), especially around the left peak pairs in Cit(q) within 2.975-3.05 ppm. The envelope peak symmetry from panel (a) is not replicated on the companion panel (c). For instance, the fourth citrate peak near 3.02 ppm (c) is the tallest and widest resonance at 2.80-3.05 ppm. This peak has the hidden shoulders on each side, pointing to an inner structure. It is surrounded by the two intense and wide resonances of nearly the same strength.

On panel (a: benign), the flat region between the two inner citrate peaks is almost deserted. By contrast, the same frequency region on panel (c: malignant) is highly and irregularly structured showing many wiggled and skewed lineshapes. If no quantification of Cit(q) were feasible on a relatively clean spectrum from panel (a), as stated, this would be even more out of the question for panel (c). In other words, to give a chance to some nonparamatric extracting of information, derivative estimations are needed also in the frequency window 2.80-3.05 ppm on panel (c).

However, prior to proceeding with such a step forward, it is instructive to inspect the spectral plethora from the nonderivative components on panel (d). Herein, the middle portion in between the two inner peaks of Cit(q) is populated mainly by the NAA multiplet, NAA(m), which is not labeled, as noted. Among these resonances, the most prominent is the NAA triplet peak, NAA(t), which is located very near the center 2.925 ppm of the band 2.80-3.05 ppm. Much of the middle section on panel (d) is obscured because it is overlaid by the long tails of the taller resonances.

By contrast, even a quick perusal of the immediate environment of the fourth citrate component peak on the far left of panel (d) can explain the lineshape deformations in the envelope (c) at 3.00-3.05 ppm. Therein, a tight spectral crowding is reflected in the presence of some nine visible resonances. One of these resonances (3.025 ppm) is nearly of the same height as the fourth citrate peak itself (3.020 ppm), but it is about five times wider. On each side of the fourth citrate peak, within 3.005-3.035 ppm, there are two wider intense peaks that completely overlay the four symmetric smaller resonances. Moreover, one half of a wider peak can also be seen as being centered very close to the left edge (3.05 ppm) of the band. The widest component resonance near 3.025 ppm on panel (d) could be assigned to a larger molecule e.g. one of proteins left in the sample after deproteinization with the mentioned 10 kD molecular mass cut-off value.

It is an intricate mixture of strong destructive and constructive interference effects of all the nine components (d) within the region 3.00-3.05 ppm that creates the three wide peaks in the envelope (c). Again, this is not meant to be taken literally as if indeed the components (d) explicitly build the envelope (c). However, the inherent presence of such components (d) is felt in the envelope (c) too. This is reasoned by the fact that the same envelope (c) from the nonparametric FPT is also obtained from the parametric FPT.

The hypothetical components assumed to exist as being folded in the envelope (c) could be deconvolved by the derivative transforms to eventually mimic the true components (d). This can be carried out by following the road already paved for the water residual in Figs. 2 and 3 in the case of the benign sample. Connecting to panels (c, d) of Fig. 4 for the malignant sample, we show the derivative envelopes (e: \(m=1\), g: \(m=2\)) and components (f: \(m=1\), h: \(m=2\)) from the nonparametric and parametric dFPT, respectively.

Already, the first derivative envelope (e) achieves a remarkable improvement over its nonderivative predecessor (c). On panel (e), the fourth peak (3.020 ppm) of Cit(q) is thoroughly cleared up from its two strong wider resonances. The former distortions of this citrate constituent at 3.02 ppm led to its erroneously enhanced peak intensity on panel (c). However, with much of the close environment evaporated/diminished, this fourth peak acquires its new height which is much closer to the usual relative intensity ratios in Cit(q).

The middle part of the envelope, formerly rough/bumpy and filled with unrecognizable lineshapes on panel (c), becomes straightened and reasonably well delineated on panel (e). It clearly exhibits the NAA(m) peaks with its central structure, NAA(t). It is pleasing to see that most of the first derivative components (f) are strikingly reminiscent of the envelope (e). Nevertheless, an astute reader would notice that the fourth citrate resonance at 3.020 ppm is slightly taller in the envelope (e) than in its component counterpart (f). This is caused by interference of this fourth citrate peak on panel (e) with its low-lying neighbors at 3.00-3.04 ppm. The peak heights of the other three resonances in Cit(q) are quite mutually concordant (intensity-ratio-wise) in the envelope (e) and components (f).

The second derivative spectra are given on panels (g) and (h) for the envelope and components, respectively. Compared to the envelope on panel (e, \(m=1\)), the envelope on panel (g, \(m=2\)) is more advanced due to its substantial refinement. Thus, most of the lower parts of the lineshapes are much better delineated in the second (g) than in the first (e) derivative envelopes. In particular, the NAA(m) resonances in the envelope (g), situated between the two inner peaks of Cit(q), show many bell-shaped well-resolved peaks, dominated again by NAA(t) at 2.930 ppm. Additionally, a weaker triplet NAA(t) is clearly propelled at 2.875 ppm. The fourth citrate peak at 3.02 ppm on panel (g) has now its height slightly lower than on panel (e). The reason is in a lessened interference of this Cit(q) peak with its adjacent resonances that singled themselves out as sharp and thin lineshape profiles.

Crucially, these findings (with all the spectral details within 2.80-3.05 ppm, including the NAA multiplet) from the second derivative envelope (g) are corroborated by the corresponding second derivative components (h). As discussed, the first three Cit(q) peaks in the envelope (e) and components (f) were in agreement already for the first derivative transform. This accord persists for the second derivative spectra (g: envelope) and (h: components). Furthermore, also the fourth peak of Cit(q) is now the same on panels (g) and (h). It is then gratifying to observe that such a high degree of concordance between the envelope (g) and components (h) could be achieved by applying a low derivative signal processing, merely the second derivative (\(m=2\)) in the nonparametric and parametric dFPT, respectively.

3.3 Clinical aspects for differential diagnosis (benign versus malignant)

Regarding the benign sample, as discussed, only the nonderivative FPT is presented for the envelope (a) and components (b) in Fig. 4. This can readily be extended to the derivative nonparametric and parametric dFPT, similarly to the just performed analysis for the malignant sample. The derivative data analysis is necessary in nonparametric processings (i.e. shape estimations) for extracting a diagnostically interpretable quantitative information, such as the metabolite concentrations. However, even the nonderivative spectra (\(m=0\)) can give an important first insight of clinical relevance for the citrate metabolite, which is the main focus of Fig. 4.

In fact, by returning to Fig. 1 for the wider band 0.870-5.125 ppm, it can be seen that the Cit(q) resonance is notably more intense in the benign (c) than in the malignant (d) samples in the nonparametric nonderivative envelopes. This becomes more transparent when zooming into the narrow band 2.80-3.05 ppm on panels (a) and (c) of Fig. 4 for the nonparametric nonderivative envelope.

Still, for the two different cases of the ovarian cyst fluid, the most insightful comparisons of the spectra on the level of nonderivative estimations is provided by juxtaposing the components for the benign (b) and malignant samples (d) in Fig. 4. Therein, the maximum values of the component lineshapes for the inner peaks of Cit(q) for the benign sample (b) is read off as 225 au compared to about only 45 au for the malignant sample (d). Similarly, the profiles of the outer peaks in Cit(q) possess their maximae at about 175 au on panel (b: benign) versus merely 35 au on panel (d: malignant). The ratios of these maximae for the benign (b) relative to the malignant (d) samples is 5 for either the inner or the outer pairs of the component peaks in the citrate quartet, Cit(q).

Such a sizable difference in the case of the two patients under consideration can provisionally promote the citrate metabolite Cit(q) to the category of cancer biomarkers, when making the characterization of the ovarian cyst fluid (benign versus malignant). This might be all the more justified given that e.g. the total choline tCho, one of the prominently recognized cancer biomarkers, is about 3.16 (142/45) times more abundant in the malignant than in the benign ovarian cyst fluid sample for the same two patients from the present study as well as from Refs. [24, 25]. Moreover, this latter quotient is consistent with Ref. [26], which reported a similar concentration ratio [tCho](malignant)/[tCho](benign)=2.8 for a cohort of 12 and 28 patients who have been diagnosed histopathologically for malignant and benign ovarian cyst fluid, respectively.

As mentioned, a number of metabolites (Iso, Val, Thr, Ala, Lac, Lys, Met, Glu, tCho, ...) are more abundant in the malignant than in the benign specimens of the ovarian cyst fluid. For instance, the level of the sum Lac(d)+Lac(q) is about 18.45 higher in the malignant than in the benign samples. Further, in the previous examination [26] of the same problem, the Fourier processing, followed by the Lorentzian fits, was unable to resolve tCho into its three components, the free choline (Cho), phosphocholine (PC) and glycerophosphocholine (GPC). However, such a separation was possible by the Padé-conceived estimations [24, 25]. It follows from these latter studies, as well as from the present work, that the malignant/benign relative levels of Cho, PC and GPC attain their values of 2.67 (40/15), 3.50 (70/20) and 3.20 (32/10), respectively.

On the other hand, for the same benign and malignant samples of the two patients, the converse happens to be true in the case of several other metabolites, including Cit, Tau, \(\beta -\)Glc, m-Ins, etc. Thus, the benign/malignant relative levels of Cit, Tau, \(\beta -\)Glc and m-Ins are 5 (175/35), 1.54 (185/120), 1.67 (50/30) and 6 (30/5), respectively. These quotients indicate that low levels of Cit and m-Ins metabolites could be considered as cancer biomarkers. At the same time, the cited corresponding ratios for the Tau and \(\beta -\)Glc metabolites do not satisfy the conventional cut-point (e.g. 3 or so) for merging to the list of cancer biomarkers. In Ref. [26], none of these metabolites (Cit, Tau, \(\beta -\)Glc, m-Ins) have been quantified because their resonances overlap with the neighboring peaks.

From the biochemical viewpoint, it is important to understand the meaning and diagnostic repercussions of different pathways of the ovarian metabolism, which determine whether the malignant-to-benign sample ratios for various metabolites will be greater or smaller than unity. This, in turn, would have a critical impact on decision making regarding the vitally important differential diagnosis (benign or cancerous). As underlined, this quotient is greater than 1 for e.g. lactates or cholines, whereas it becomes smaller than 1 for e.g. citrate or myo-inositol. The overall complexity of different metabolic pathways depends of the kind of the scanned samples (body fluid, tissue, ...) on top of the specific organ-dependence. It is often unknown which biochemical mechanism governs the increased or decreased levels of a metabolite in malignant compared to benign samples.

For example, the malignant/benign level greater than 1 for Lys has not been explained biochemically in the literature. However, higher levels of \(\beta -\)HB in the malignant versus benign ovarian cysts are viewed as being caused by rapid cellular metabolism in cancerous specimens. On the other hand, increased levels of branched chain amino acids (Iso, Leu, and Val) are considered to be among the products of protein breakdown. It is particularly interesting to note that the same Warburg effect, i.e. enhanced glycolysis is taken to be in the root of both elevation of Lac as well Ala and depletion of Glc.

Depletion of Cit in malignant ovarian cyst fluid is thought to be linked to its usage in fatty acid synthesis, which is needed for rapid turnover of cancerous cells. In fatty acid biosynthesis within the ovarian metabolism, inhibition of the enzyme ATP-citrate lyase delays tumor progression. Such a metabolic path for tumor suppression growth yields the lower Cit level for the malignant relative to the benign ovarian cyst fluid lesions. This is consistent with the common occurrence that healthy ovarian epithelial cells are able to synthesize and secrete notably high level of citrate. A similar argument also holds true for normal prostate epithelial cells. For prostate, one of the recognized cancer biomarkers is citrate, which is distinguishable by its depleted levels in the malignant versus benign samples. The extent of the Cit depletion depends on various factors, including the tumor grade.

Overall, it would seem reasonable to consider citrate as a cancer biomarker also for the ovary. Such a proposal is based on our analysis of two patients. Whether this could be corroborated by the inclusion of all the mentioned 40 patients from Ref. [26], remains to be seen through a continuation of our present study in the near future.

4 Conclusion

Generically, inverse problems are about searching for some of the possible causes of the observed effects. All of medicine is essentially based on this very concept. Physics and chemistry also abound with inverse problems. Suffice it to mention the potential inferences about the molecular structure of species based on the system responses to some external perturbations. Much of what can be experimentally measured in sciences is within the realm of inverse problems.

Presently, we address a real-world inverse problem in cancer medicine within ovarian tumor diagnostics by means of nuclear magnetic resonance (NMR) spectroscopy. The measured time signals from the ovarian cyst fluid of the examined patients contain the metabolite fingerprints of the scanned samples. The goal is to correlate the NMR reconstructions with the prior histopathological diagnoses, benign (serous cystadenoma) and malignant (serous cystadenocarcinoma).

The inverse problem in this field of metabolomics within NMR spectroscopy consists of reconstructing the chemical composition of the scanned tissues or biofluids by analyzing the encoded time signals. This amounts to solving the spectral analysis problem (the quantification problem) to find the fundamental parameters, the complex resonance frequencies and amplitudes of the encoded time signal oscillations. Such signatures correspond to the peak positions, widths, heights and phases of the individual spectral lineshape profiles (components) in the computed total shape spectrum (envelope) as a function of the sweep frequency.

We show how this mathematically ill-conditioned problem can robustly be solved by the nonparametric derivative fast Padé transform (dFPT), which is a shape estimator. This processor does not even address, let alone solve explicitly the quantification problem. Its quantification results (retrieval of the components from an envelope) are corroborated by the corresponding parametric dFPT. The supplementary derivative frequency-dependent operator can be applied to a spectrum reconstructed either by shape or parameter estimations.

The sought quantitative information from the examined patient can aid life-saving decision-making in differential diagnosis (benign versus malignant). Detection of cancer biomarkers, the metabolites that can help distinguish benign from cancerous lesions, is of utmost importance in medical diagnostics. Lactate and choline metabolites with high levels are among the recognized cancer biomarkers for the brain, breast, prostate, ovary and cervix. For the prostate, citrate metabolites are also recognized cancer biomarkers (low in cancer, high in benign or normal peripheral zone). It is not widely established that the citrate metabolites can serve as cancer biomarkers also for the ovary (low in cancerous and high in benign or normal). The NMR analysis in the present study supports the prospect of considering the citrates as potential cancer biomarkers. Further envisaged investigations in the future along these lines for a cohort of some forty patients could help put this feasibility-study-type proposal on a firmer basis.