Introduction

Spectroscopy allows an investigation of the interplay between matter and radiation as a relation and dependence of wavelength, which is a not annihilative, harmless, non-contactable and quick methodology compared to the conventional approaches in chemometrics (Fu and Ying 2016; Zheng et al. 2017). Raman spectroscopy (RS) and Infrared spectroscopy (IS) are two more prominent techniques with a plethora of applications from science and agriculture to engineering. Raman is a light dispersion methodology in which a molecule disperses the ray of incident light from an intensified laser light emitter. Maximum dispersed light holds the same wavelength as the light source; hence, it fails to provide valuable and beneficial information. Meaningful information is hidden in a little ray of light scattered at differing wavelengths called Raman scattering (Chen et al. 2022b; Araújo et al. 2021).

IS deals with observing molecules stimulated by an infrared light beam resulting in an infrared absorbance spectrum. IS absorbance spectrum being a "fingerprint" of any (bio)chemical component, provides intrinsic information about the substances, which is necessary for many investigations. Recently, RS and IS combined with Machine Learning (ML) and Dimensionality Reduction (DR) approaches have numerous realistic and pragmatic applications like quantitative and qualitative assessments of soil attributes to ensure fertility and productivity via the formulation and recommendation of modified fertilizer compositions (Barra et al. 2021). Effectual guidance is provided in diagnosing different types of cancer based on the spectrums generated on the blood plasma and skin samples (Chen et al. 2022b; Araújo et al. 2021; Mohamed Yousuff and Rajasekhara Babu 2022). Detection of food adulteration, especially in seafood, honey, and edible oils, helps differentiate the quality and debasement of fuels such as diesel and gasoline (Dumancas and Ellis 2022; Owen et al. 2021; Zhao et al. 2022; Li et al. 2020; Wang et al. 2000a, 2000b). Evaluation of antibiotic susceptibility of certain microorganisms such as bacteria and finding the types of diseases that affected the crop leaves (Suleiman et al. 2022; Mohamed Yousuff et al. 2020). Microbial spoilage detection and classification of muscle foods (Ellis et al. 2004; Yan et al. 2021). Investigating the quality of products, specifically tea, coffee and fruit-based beverages (Mishra et al. 2018; Hu et al. 2022; Bizzani et al. 2020). Chemical component analysis and many more medical and health-related applications are apparently implemented using these technologies (Gao et al. 2022; Chen et al. 2022a; Ralbovsky et al. 2021; Liu et al. 2021).

Spectra usually have many wavelength features ranging from a hundred to a thousand dimensions. Classification approaches utilizing the entire spectrum of features become a time-consuming process, and extraneous information within the spectrum will compromise the model's precision and stability. Especially if the dimensionality of the input spectral features is very high, the modeling time and cost will be extraordinarily expensive. Regarding sampling cost in industrial and medical applications, the number of spectral observations or samples (\(s\)) collected for any scenario is usually lower than the number of features (\(f\)). Without well-formed and conditioned observation matrix, such as \(s >> f\), ML models cannot deliver accurate predictions. In other words, RS and IS spectrums are high-dimensional data which require an effective DR approach to achieve better classification metrics (Mishra et al. 2018; Zheng et al. 2019).

Principal Component Analysis (PCA) is among the most widely implemented linear DR algorithm for several High Dimensional Space (HDS) data. PCA is a projection-based approach that transforms the data points by mapping them onto orthogonal axes. PCA identifies the optimal linear combinations of the actual spectral features so that the variance or dispersion along the transformed feature is maximized (Zhang et al. 2022). In the context of spectral data, an eigenvector indicates a direction or axis of the data, and the associated eigenvalue reflects the dispersion along that eigenvector. The greater the eigenvalue, the greater the variance, especially along the corresponding eigenvector. The outcome of the PCA algorithm is the principal components which are linear combinations of actual wavelengths of the spectrum (Zhao et al. 2022).

Therefore, PCA cannot comprehend intricate polynomial correlations between features. Thus, a major issue with PCA is that it fails to produce efficient Low Dimensional Space (LDS) if there exists more nonlinearity in the spectrums (Liu et al. 2017). To overcome this problem, Kernel PCA (KPCA) is introduced to manage the nonlinear aspects of the spectrums. Kernel PCA is implemented to aid in the determination of data points whose decision boundaries are characterized by a nonlinearly separable function. The notion behind the concept of the kernel is to move to an HDS in which the decision boundary of the spectral features turns linear. A general nonlinear integration of the original features will generate a large number of new components or features after the implementation of the kernel function, which exponentially increases the problem's computational complexity compared to PCA (Sun et al. 2019).

KPCA cannot outperform PCA if many data points present in the spectral data are linearly separable; hence, using a nonlinear kernel may result in performance degradation due to overfitting (Li et al. 2020). Multi-Dimensional Scaling (MDS) is a method for learning manifolds that preserves distance. Methods that preserve distance presuppose that a manifold is given by the pairwise distances between its data points. In distance-preserving approaches, an LDS is created from an HDS such that pairwise distances between data points stay unchanged. MDS maintains spatial distances, whereas other approaches maintain graph distances. The dissimilarity matrix is computed from the input spectrums. MDS considers the dissimilarity matrix and creates a corresponding mapping on an LDS, retaining the dissimilarities of the data points as precisely as possible. Generating the dissimilarity matrix at each step of MDS needs a significant amount of processing resources. It is not easy to incorporate new data into MDS (Mishra et al. 2018).

Isometric mapping (ISOMAP) is a nonlinear DR technique based on spectral theory that attempts to retain geodesic distances in the LDS. ISOMAP begins by constructing a neighborhood network. The graph distance is used to estimate the geodesic distance between each pair of data points, then uses eigenvalue disintegration of the geodesic distance matrix, and the LDS of the dataset is subsequently determined. The geodesic distance is computed as the summation of path weights on the shortest path connecting two data points. When the manifolds are not adequately sampled and have gaps, ISOMAP fails miserably. Creating a neighborhood graph is challenging, and a small error in the parameters can have negative implications (Wang et al.  2020b; Mishra et al. 2018).

t-distributed Stochastic Neighbor Embedding (t-SNE) is a revolutionary DR and data visualization technique. t-SNE incorporates not only the local patterns of the HDS but also tries to maintain the global features of the data. It has a remarkable capacity to form well-defined, distinct clusters. Student-t distribution is implemented to quantify the similarity between the data points in the LDS, and t-SNE uses a symmetric probability distribution for the HDS (Wang et al. 2020a; Luo et al. 2021). When the LDS dimension exceeds 3, t-SNE has execution issues. Similar to other gradient descent-based algorithms, t-SNE tends to become trapped in local optima. t-SNE is quite responsive to the perplexity value, producing misleading clusters. The fundamental t-SNE implementation is slow due to search requests for nearest neighbors (Wang et al. 2021b).

To overcome the above issues, we propose a Graph-based Neural Network Embedding (GNNE) approach to produce an appropriate and reliable LDS representation of HDS spectral inputs. GNNE starts with building a \(k\) closest neighbors’ graph of input data points and computing edge probabilities of HDS input observations. Then a Fully Connected Neural Network (FCNN) with a nonlinear activation function is employed to obtain the embedding of desired low dimensions. The probability values of LDS or latent space or embedding are computed, and finally, the difference between both HDS and LDS probability distributions is optimized using the cross-entropy cost function to extract efficient embeddings. The content of our work is organized as follows, various spectral datasets description, spectra preprocessing strategies, visualization of preprocessed spectrums and proposed methodology is explained in Method and Implementation section. The procedures executed to extract the LDS using the proposed technique and comparative visualizations of 2-Dimensional embeddings obtained from various DR techniques and GNNE for all the spectral datasets are discussed and depicted in Experiments section. In Results and Discussions section, the spectra classification model and its performance metrics are presented along with the DR evaluation metric. Finally, the conclusion part is mentioned in Conclusion section.

Method and implementation

Spectral datasets description

The absorbance spectra from five different spectral datasets are considered in our study to examine the performance of the proposed DR approach. The datasets such as coffee, fresh meat, olive oil, and fruit purees are available online at https://data.mendeley.com/datasets/frrv2yd9rg/1. The COVID-19 Raman spectroscopy dataset is available on https://doi.org/10.6084/m9.figshare.12159924.v1. All the five datasets are described as follows:

  1. (i)

    Coffee: Beans of coffee collected from different world regions are roasted to too many degrees of temperature; finally, it is well processed to form a lyophilized powder. The coffee powder is stored in an air-tight plastic container at -20ºC before subjecting it to spectroscopy. After a while, Fourier Transformed Infrared Spectroscopy (FTIS) is used to generate the spectra of coffee for 56 samples belonging to two classes, namely arabica (29 samples) and robusta (27 samples). Each of these coffee spectra consists of 286-dimension features in a wavelength range between 811 to 1910 cm−1 (Downey et al. 1997).

  2. (ii)

    Fresh Meat: The meat belonging to three categories, namely chicken, turkey, and pork, of approximately 100 g, is collected over a period of 14 days. Pork chops are taken; similarly, breast pieces of chicken and turkey are preferred for the spectroscopy. After removing skin and bones, all the meat is softened using a blender and cleansed using 2% Triton-X solution and distilled water. The Mid-IS spectrums are taken for 20 samples belonging to each category under frozen and thawed conditions. Each of these fresh meat spectra consists of 448-dimension features in a wavelength range between 1006 to 1867 cm−1 (Al-Jowder et al. 1997).

  3. (iii)

    Olive Oil: Sixty specimens of genuine virgin olive oils are obtained from four reputed European nations well known for their oil production. The samples gathered are 10, 17, 8, and 25 from Greece, Italy, Portugal and Spain, respectively. Two distinct durations of around 14 days are spent to collect the data. Before and between spectral observations, specimens were kept in the dark at room temperature. Each of these virgin olive oil spectra consists of 570-dimension features in a wavelength range between 799 to 1896 cm−1 (Tapp et al. 2003).

  4. (iv)

    Fruit Purees: Mid-IS spectrums are measured on two different types of verified fruit purees. The first type contains 351 spectra belonging to the ‘Strawberry’ category. Fresh strawberry fruits are collected, and purees are prepared by the researchers, which are then subjected to spectroscopy. The second category contains 632 spectra belonging to the ‘Non-Strawberry’ class. It is a kind of adulterated strawberry purees mixed with several other fruit juices and sugar solutions. Each of these fruit purees spectra consists of 235-dimension features in a wavelength range between 900 to 1802 cm−1 (Holland et al. 1998).

  5. (v)

    COVID-19: The blood serum of confirmed COVID-19 patients, healthy persons, and suspected individuals are obtained. The RS analysis process is performed, and spectra are measured on all types of serum specimens. 465 spectra are totally acquired, out of which 159 spectra come under the COVID-19 category, 156 spectra belong to the suspected class, and 150 fit in the healthy class. Each of these COVID-19 spectra consists of 900-dimension features in a wavelength range between 400 to 2112 cm−1 (Yin et al. 2019).

Spectra preprocessing

The spectral data analysis procedure can assure better results with effective preprocessing steps implemented prior to the analysis phase. A sequence of preprocessing steps like background noise elimination, normalization, smoothening, and baseline emendation is carried out in order to ensure finer and enhanced classification metrics (Khan et al. 2018). Applying a Savitzky-Golay filter, all spectrums are smoothed. A digital filter such as the Savitzky–Golay can be used to smooth the data, enhancing its lucidity, sharpness, and resolution without altering the spectrum's rudimentary pattern. In a convolution approach, a low-degree polynomial is adapted to successive subsets of adjacent data points using the linear least-squares technique to obtain the smoothing outcome (Mohamed Yousuff and Rajasekhara Babu 2022; Schafer 2011). The preprocessed spectrums and their corresponding average spectrums of all the datasets are depicted in Figs. 1, 2, 3, 4, and 5, respectively.

Fig. 1
figure 1

Preprocessed FTIS spectrums of arabica and robusta coffee variety (a) Set of all preprocessed coffee spectrums (b) Preprocessed average coffee spectrums

Fig. 2
figure 2

Preprocessed Mid-IS spectrums of chicken, pork and turkey fresh meat (a) Set of all preprocessed fresh meat spectrums (b) Preprocessed average fresh meat spectrums

Fig. 3
figure 3

Preprocessed FTIS spectrums of Greece, Italy, Portugal and Spain olive oils (a) Set of all preprocessed olive oil spectrums (b) Preprocessed average olive oil spectrums

Fig. 4
figure 4

Preprocessed Mid-IS spectrums of strawberry and non-strawberry fruit purees (a) Set of all preprocessed fruit purees spectrums (b) Preprocessed average fruit purees spectrums

Fig. 5
figure 5

Preprocessed RS spectrums of COVID-19, healthy and suspected blood serum (a) Set of all preprocessed blood serum spectrums (b) Preprocessed average blood serum spectrums

Methodology

Similar to t-SNE and associated techniques, we presume that the data points which are in close proximity to each other in the HDS as per a pertinent metric should also be closer to one another in the embedding space. Consequently, we also assume that the data points which are far away in proximity in the HDS should also be isolated accordingly in the LDS. We suppose a metric like the Euclidean distance in the HDS is adequate to depict a manifold on which the input observations lie (van der Maaten and Hinton 2008). The objective of the proposed GNNE approach: Given \(D\)-dimensional spectral data points \(S \in {\mathbb{R}}^{D}\), create a \(d\)-dimensional LDS or embedding \(E \in {\mathbb{R}}^{d} (d\ll D)\) such that the data points nearer in proximity in \(S\) (for example \({S}_{i}\) and \({S}_{j}\)) should also be closer to one another in \(E\) (\({E}_{i}\) and \({E}_{j}\)). GNNE computes a nearest-neighbors graph and edge probability values of each input spectral datapoint followed by extraction of needed latent space using FCNN with a nonlinear activation function. The cross-entropy cost function is implemented to reduce the difference between high dimensional and low dimensional probability distributions, resulting in an effectual embedding that can be further utilized for visualization and classification of the spectra.

Data graph in the HDS (Spectral Input Space)

Considering the spectral dataset \(S=\left[{s}_{1},{s}_{2},\dots ,{s}_{n}\right]\in {\mathbb{R}}^{D X N}\) where \(N\) is the number of spectral observations and \(D\) is their corresponding dimensionality. We build a \(k\)-Nearest Neighbors (\(k\) NN) graph (\(k\) is considered as a hyper parameter) for the given spectral input space (Dong et al. 2011). The \(j\)-th neighbor of \({s}_{i}\) is indicated as \({s}_{i,j}\) then \({\eta }_{i} :=\{{s}_{i,1}, {s}_{i,2},\dots ,{s}_{i,k}\}\) represents the set of neighbor data points for the observation\({s}_{i}\). We considered the neighbor affinity relationship among the data points randomly. Radial Basis Function (RBF) or Gaussian kernel is used to compute the similitude between the input data points in the HDS (van der Maaten and Hinton 2008; Hinton and Roweis 2002; Ghojogh et al. 2020). The probability that a spectral data point \({s}_{i}\) has \({s}_{j}\) as its neighbor can be calculated using the similarity of these data points as given in Eq. 1.

$${\mathbb{p}}_{j|i} =\left\{\begin{array}{c}\mathrm{exp}\left(- \frac{\parallel {s}_{i}-{s}_{j}{\parallel }_{2}-{\zeta }_{i}}{{\psi }_{i}}\right), if {s}_{j} \in {\eta }_{i}\\ 0, otherwise\end{array}\right.$$
(1)

where \(\parallel .{\parallel }_{2}\) indicates the \({L}_{2}\) norm, \({\zeta }_{i}\) is the measure of distance between \({s}_{i}\) and its nearest neighbor data point given by Eq. 2.

$${\zeta }_{i}= \mathrm{min}\left\{\parallel {s}_{i}-{s}_{i,j}{\parallel }_{2} \right| 1\le j\le k\}$$
(2)

The \({\psi }_{i}\) is the scaling variable, calculated so as to normalize the total similarity of the data point \({s}_{i}\) to its \(k\) NNs. Using binary search \({\psi }_{i}\) is determined to gratify Eq. 3.

$$\sum_{j=1}^{k}exp\left(\frac{\parallel {s}_{i}-{s}_{i,j}{\parallel }_{2}-{\zeta }_{i}}{{\psi }_{i}}\right)= {\mathrm{log}}_{2}(k)$$
(3)

t-SNE uses entropy as perplexity for a similar scale search. Since the scaling for a data point in a crowded section of the dataset turns small, the scaling for a data point in a sparsely dispersed area of the dataset becomes vast; as a result, these searches cause the neighborhoods of diverse data points to act similarly. In other terms, we presume that the observations are distributed uniformly on an LDS manifold. Directional similitude measure is given in Eq. 1 whereas Eq. 4 gives the symmetric measure of similitude between data points \({s}_{i}\) and \({s}_{j}\) in the high dimensional spectral input space.

$${\mathbb{R}}\ni {\mathbb{p}}_{ij}={\mathbb{p}}_{i|j}+ {\mathbb{p}}_{j|i}- {\mathbb{p}}_{i|j}{\mathbb{p}}_{j|i}$$
(4)

Data graph in the LDS (Embedding)

Let the embeddings of the spectral data points be \(E=[{e}_{1},{e}_{2},\dots ,{e}_{n}] {\in {\mathbb{R}}}^{d X n}\) where \(d\) is the dimensionality of the LDS, which is always considerably smaller than the HDS or spectral input space (\(d\ll D\)) and \(n\) is the number of data points (\(N:=n\)). Notice that \({e}_{i}\) is the LDS commensurate to \({s}_{i}\). In the LDS, the probability that a data point \({e}_{j}\) is the neighbor of \({e}_{i}\) can be calculated by the similitude of these data points given in Eq. 5.

$${\mathbb{R}}\ni {\mathbb{q}}_{ij}={\left(1+u\parallel {e}_{i}-{e}_{j}{\parallel }_{2}^{2v}\right)}^{-1}$$
(5)

The variables \(u>0\) and \(v>0\) are the two hyperparameters influenced by the user. We have considered the value of \(u\approx 1.9289\) and \(v\approx 0.7914\), since it has been empirically demonstrated that choosing \(u=v=1\) has no qualitative effect on the outcomes (Böhm et al. 2022).

Neural network-based embedding

The proposed technique tries to create a similarity between the data graph in the HDS and the data points in the LDS. In other words, we interpret Eq. 4 and 5 as probability distributions and decrease the disparity between them so that the similitude of data points in the LDS resembles the similarities of data points in the HDS. The modified cross-entropy cost function (\({\mathbb{C}}\)) given in Eq. 6 (Tang et al. 2016) is used as a measurement for the variation between these two probability distributions.

$${\mathbb{C}}=\sum_{i=1}^{n}\sum_{j=1,j\ne i}^{n}\left({\mathbb{p}}_{ij }\mathrm{ln}\left(\frac{{\mathbb{p}}_{ij}}{{\mathbb{q}}_{ij}}\right)+(1-{\mathbb{p}}_{ij}) \mathrm{ln}\left(\frac{1-{\mathbb{p}}_{ij}}{1-{\mathbb{q}}_{ij}}\right)\right)$$
(6)

where \(\mathrm{ln}\left( .\right)\) denotes the natural logarithm. The first part of Eq. 6 specifies the pulling force that pulls neighboring data point embeddings towards one another. This component of the equation can only exist when\({\mathbb{p}}_{ij}\ne 0\), indicating that there are three possibilities such as \({s}_{i}\) is closer in proximity to \({s}_{j}\) or \({s}_{j}\) is a neighbor of \({s}_{i}\), or both. The second element in Eq. 6 is the repelling force that separates the embeddings of data points that are far in proximity or non-neighbors. The loss function of the neural network (\({f}_{\vartheta } (.))\) is the cost function of the algorithm, and the weights \(\vartheta\) (parameters) of the network are updated while training using backpropagation of loss function errors. Instead of optimizing the cost for the entire dataset, optimization is done in mini-batches so that the model is capable of taking many more observations. The nonlinearity in the layers of the neural network is achieved by implementing an effective nonlinear activation function; hence the model can easily manage very high nonlinear inputs. The steps and procedures involved in the proposed GNNE approach are given in Algorithm 1.

Algorithm 1
figure a

Graph based Neural Network Embeddings

Hyperparameters

Because networks include many parameters, some important hyperparameters are fixed before training the model. Bayesian optimization technique is used for the selection of hyperparameters used in the work. Some of the hyperparameters to be tuned the neural network are number of neurons, activation function, learning rate, batch size, number of epochs etc. The proposed algorithm has four hyperparameters as follows:

  1. 1.

    \(k\), the number of nearby neighbors to take into account while estimating the local (Euclidean) distant metric. The \(k\) parameter enables a quantifiable indication of how effectively the embedding has kept the crucial local structure of the input HDS data. By altering the values of \(k\), we can further examine how structure preservation changes during the shift from exclusively local to non-local to global structure. \(k\) exemplifies a level of trade-off between granular and massive scale manifold or diverse nonlinear features. Smaller values assure meticulous nonlinear structure is precisely apprehended, while larger values grab massive scale nonlinear structures. With smaller \(k\) values, the manifold begins to fragment into a multitude of small, interconnected components. It is intuitive and empirically evident to choose a range of smaller values ranging between 5 to 50 for datasets with fewer observations. On the other hand, \(k\) value can be more than hundred in case of large datasets.

  2. 2.

    \(d\), the dimension of the expected embeddings or LDS. 2 or 3-dimensional embeddings are very much essential to visualize the HDS in LDS and derive significant insights. In contrast, more than 3-dimensional LDS can be utilized for better and more efficient classification tasks achieving high-performance metric values.

  3. 3.

    \(e-dist\), an intended distance of separation between adjacent points in the embedding space. \(e-dist\) regulates how tightly data points can be compacted to one another in the LDS representation. Smaller values of \(e-dist\) will likely result in crowded sections but will more accurately depict the manifold structure. Larger value of \(e-dist\) will drive the embedding to disperse the data points further, so facilitating visualization by preventing any overlapping and overplotting concerns. We consider \(e-dist\) to be a purely aesthetic factor that influences the visual aspect of the LDS, and therefore it is more relevant while performing visualization task. Higher values of \(e-dist\) causes the distinguishable clusters to get squeezed together, diminishing the boundaries between the clusters.

  4. 4.

    \(n-epochs\), the number of epochs required during the training phase to optimize the LDS representation. In every epoch, each input observation in the batches of the training dataset gets a chance to internally fine-tune the model weights resulting in minimization of error and obtaining better embedding as output.

Experiments

The GNNE algorithm takes five parameters as input, such as the HDS dataset, \(k, d,e-dist\) and \(n-epochs\). For all the experiments on the five aforementioned datasets we have considered \(k=15, d=\left\{\mathrm{2,3},\mathrm{5,10,15,20}\right\},e-dist=0.1\) and \(n-epochs=10\). The Nearest-Neighbor-Descent (NND) technique offers a cost-effective approximation of the \(k\)-nearest-neighbor calculation. Due to the error inherent in DR techniques, such an estimation is far beyond sufficient for these tasks. Even though no conceptual complexity limitations have been determined for NND, the authors mention an experimental complexity of \(O({N}^{1.14})\) (Dong et al. 2011). The main advantage of NND is its versatility; it can be applied to any reasonable dissimilarity measure and is efficacious even for HDS. Hence NND with Euclidean distant metric is used in constructing the \(k\) NN graph. Based on the probabilities reported by the \(k\) NN graph a Tensorflow dataset is built through an iteration process on the batches of \(k\) NNs. These input data are then subjected to an FCNN for optimization and efficient generation of embeddings.

The FCNN with four layers is used to obtain the LDS. The input layer is meant to acquire the spectral observations. The two hidden layers, followed by the input layer, have 128 and 64 neurons along with Scaled Exponential Linear Unit (SELU) nonlinear activation function. Implementing the SELU activation function gives the layers of neural network self-normalizing characteristics. The SELU activation function and its derivate is depicted in Eqs. 7 and 8 (Klambauer et al. 2017).

$$SELU\left(u\right)=\lambda\left\{\begin{array}{lc}u,&if\;u>0\\\alpha e^u-\alpha,&if\;u\leq0\end{array}\right.$$
(7)
$$SELU'(u)=\lambda\left\{\begin{array}{lc}1,&if\;u>0\\\alpha e^u,&if\;u\leq0\end{array}\right.$$
(8)

where both \(\lambda\) and \(\alpha\) parameters are set to a value of 1.0507 and 1.6733 empirically to infer better performance from FCNN (Klambauer et al. 2017). Each dense layer of the FCNN is incorporated with alpha dropout layers in between with a 0.1 dropout value to promote regularization of FCNN and to avoid over-fitting. These FCNN layers set 10% of input units to zero at each iteration but are meant to function with SELU to retain the self-normalization characteristic by preserving the variance and mean of inputs. SELU activation function is preferred for the following reasons: (i) It can have both negative and positive values, allowing the mean to be controlled. (ii) It facilitates saturation zones, allowing to soften large variances from preceding layers. (iii) It has a slope value greater than one, which enables it to raise low variances from the preceding layers. (iv) They form a seamless curve that guarantees a stable point between variance dampening and rising.

The backpropagation approach is utilized as a primary technique to train the FCNN. Weights are consistently updated after each iteration; as a result, the error rate is decreased, assuring the model’s stability. Adamax optimizer with hyperparameters value such as learning rate = 0.001, beta1 = 0.9, beta2 = 0.999 is chosen for faster convergence (Kingma and Ba 2015). The last layer of FCNN is the nodal point to extract the embeddings. Hence the number of units in this layer is equal to the dimension of LDS. We have considered six different LDS dimensions namely 2,3,5,10,15,20. LDS with 2-dimension (2D) and 3-dimension (3D) are helpful for visualization tasks, whereas higher dimensions are intended to perform the classification task. 2D graphs of various DR techniques and the GNNE approach on various datasets are illustrated in Figs. 6, 7, 8, 9, and 10. All DR techniques used in our work are implemented in python using Scikit-learn (Pedregosa et al. 2011), and all the visualizations are aid by NumPy (Walt et al. 2011), Pandas (McKinney 2010), Matplotlib (Hunter 2007), and Seaborn (Waskom et al. 2014) libraries.

Fig. 6
figure 6

2D visualization of coffee spectra using different DR techniques (a) KPCA (b) MDS (c) ISOMAP (d) t-SNE (e) Proposed Approach

Fig. 7
figure 7

2D visualization of fresh meat spectra using different DR techniques (a) KPCA (b) MDS (c) ISOMAP (d) t-SNE (e) Proposed Approach 

Fig. 8
figure 8

2D visualization of olive oil spectra using different DR techniques (a) KPCA (b) MDS (c) ISOMAP (d) t-SNE (e) Proposed Approach

Fig. 9
figure 9

2D visualization of fruit purees spectra using different DR techniques (a) KPCA (b) MDS (c) ISOMAP (d) t-SNE (e) Proposed Approach

Fig. 10
figure 10

2D visualization of COVID-19 spectra using different DR techniques (a) KPCA (b) MDS (c) ISOMAP (d) t-SNE (e) Proposed Approach

The coffee and fruit purees spectra have binary labels, whereas fresh meat, olive oil, and COVID-19 spectra are categorized into multiple labels. We can infer from the preprocessed spectral graphs (i.e., from Figs. 1, 2, 3, 4 and 5) that all datasets have a high degree of nonlinearity. The GNNE technique can handle the nonlinear data points considerably well compared to the existing methods, which is apparent in the 2D visualization of Figs. 6, 9, and 10. All the techniques did provide an interpretable 2D visualization for fresh meat spectra, as shown in Fig. 7. Additionally, the proposed approach is able to maintain the compactness between the spectra of similar category. Due to numerous nonlinear spectra in olive oil dataset, all the techniques have overlaps in projecting the HDS in 2D, as illustrated in Fig. 8.

Results and discussions

The DR techniques are further evaluated using classification task performance metrics and trustworthiness. The performance metrics such as accuracy, precision, recall, F1-score, and Matthew's Correlation Coefficient (MCC) are evaluated on 2D, 3D, 5D, 10D, 15D, and 20D LDS of all the spectral datasets. Coffee and fruit purees spectra are meant for binary classification, whereas the other three datasets are subjected to multi-class classification.

Spectra classification model and evaluation

Random Forest Classifier (RFC) is a predominant and accustomed model implemented for spectral features classification (Breiman 2001; Gomes Marques et al. 2021; Wang et al. 2021a; Zhou et al. 2020). RFC is a collection of Decision Tree (DT) classifiers where each tree takes on the inputs of an independent random vector selected with a similar distribution for all DT in the forest. The fundamental principle of RFC is that a number of weak learners (DT) can be combined to create a strong learner. By introducing randomization into the sample choosing process via Bootstrap resample, many distinct trees are created, making the RFC less susceptible to over-fitting. Feed the input spectral features to every DT in the forest for the classification task. Each DT provides a classification or votes for a particular category or class. The forest selects the classification with the highest votes (Breiman 2001). Even in noisy situations, RFC detects significant features adequately, and it is adept at handling HDS of the spectrum features (Ghebleh Goydaragh et al. 2021; Wójtowicz et al. 2021).

\(K\)-fold cross-validation is a statistical method used to assess the model's performance. First, \(K\) segments are created from the overall spectral observations. The data points segments are then divided into training and testing batches randomly. The model picks one data segment for testing and the leftover \(K-1\) data points segments for training. Therefore, in this manner, the model predicts the observations consecutively by repeating running the exact procedure k times. We have considered \(K=10\); hence the whole spectral data is split up into 10 subgroups resulting in tenfold cross-validation. During the training process, the model extracts the perceived features of the LDS spectra obtained after DR and then predicts the labels of the data points in the testing phase, ultimately resulting in the classification.

The confusion matrix consists of four terms such as (i) True Positive (TP), (ii) False Positive (FP), (iii) False Negative (FN), and (iv) True Negative (TN). Consequently, a superior classification model contains a greater number of TP and TN. Moreover, FP and FN values represent the model's errors and misclassifications. These four confusion matrix values are the essential factors to compute the metrics like accuracy, precision, recall, F1-score, and MCC using Eqs. 9 to 13. MCC computes the correlation between the actual and predicted categories, yielding a value between -1 and 1. Hence, a value close to 1 is meant to be a good score which is only be achieved if the model is precise and reliable in all confusion matrix terms (Chicco et al. 2021). We have computed Micro Average Precision (MAP) and Micro Average Recall (MAR) for multi-class classification spectra as given in Eqs. 14 and 15.

Various evaluation metrics of classification tasks performed on the 2D, 3D, 5D, 10D, 15D, and 20D LDS spectra are given in Tables 1, 2, 3, 4, and 5, whereas accuracy and MCC metrics are depicted in Figs. 11 and 12. All the DR techniques have improved the performance metrics, which is apparently visible in the results. The proposed technique has further enhanced the performance metrics significantly than the other DR techniques, especially in the case of multi-class spectra classification.

$$Accuracy=\frac{TN+TP}{TP+TN+FP+FN}$$
(9)
$$Precision=\frac{TP}{FP+TP}$$
(10)
$$Recall=\frac{TP}{FN+ TP}$$
(11)
$$F1-Score=2\left(\begin{array}{c} \\ \genfrac{}{}{0pt}{}{\frac{Precision*Recall}{Precision+Recall}}{}\end{array}\right)$$
(12)
$$MCC =\frac{\left(TN* TP\right)-(FN*FP)}{\sqrt{\left(FP+TP\right)*\left(FN+ TP\right)*\left(FP+TN\right)*(FN+TN)}}$$
(13)
$$MAP=\frac{(T{P}_{clas{s}_{1}}+T{P}_{clas{s}_{2}}\dots +T{P}_{clas{s}_{n}})}{(F{P}_{clas{s}_{1}}+F{P}_{clas{s}_{2}}\dots +F{P}_{clas{s}_{n}})+( T{P}_{clas{s}_{1}}+T{P}_{clas{s}_{2}}\dots +T{P}_{clas{s}_{n}})}$$
(14)
$$MAR=\frac{(T{P}_{clas{s}_{1}}+T{P}_{clas{s}_{2}}\dots +T{P}_{clas{s}_{n}})}{(F{N}_{clas{s}_{1}}+F{N}_{clas{s}_{2}}\dots +F{N}_{clas{s}_{n}})+( T{P}_{clas{s}_{1}}+T{P}_{clas{s}_{2}}\dots +T{P}_{clas{s}_{n}})}$$
(15)
Table 1 Coffee spectra classification metrics
Table 2 Fresh meat spectra classification metrics
Table 3 Olive Oil spectra classification metrics
Table 4 Fruit purees spectra classification metrics
Table 5 COVID-19 spectra classification metrics
Fig. 11
figure 11

Accuracy metric of various spectral datasets embeddings (a) Coffee spectra (b) Fresh meat spectra (c) Olive oil spectra (d) Fresh purees spectra (e) COVID-19 spectra

Fig. 12
figure 12

MCC metric of various spectral datasets embeddings (a) Coffee spectra (b) Fresh meat spectra (c) Olive oil spectra (d) Fresh purees spectra (e) COVID-19 spectra

Trustworthiness \(T(k)\)  

The degree to which a set of LDS retains the local structure of collection of the features is a measure of its trustworthiness. It is quantified by an act of examining resemblances of the nearest neighbors of each datapoint in LDS with actual input HDS datapoint. Let \({\mathbb{N}}\) be the size of the total observations present in the input dataset and \(r(x,y)\) be the rank of the data point \(y\) in the arrangement based on the distance from \(x\) in the actual input HDS. \({\mathbb{Z}}_{k}(x)\) is a collection of those data observations of size \(k\) that are closest to the data point \(x.\) The measure of trustworthiness is given in Eq. 16 (Venna et al. 2001; Venna and Kaski 2006).

$$T\left(k\right)=1- \frac{2}{{\mathbb{N}}k \left(2{\mathbb{N}}-3k-1\right)}\sum_{x=1}^{\mathbb{N}}\sum_{y\; \in {\mathbb{Z}}_{k}\left(x\right)}r\left(x,y\right)-k)$$
(16)

The trustworthiness scale ranges from 0 to 1, with 1 being the most trustworthy. T(k) values for 2D, 3D, 5D, 10D,15D, and 20D embeddings of various DR techniques are computed and depicted in Fig. 13. The proposed method is equally trustworthy as other DR techniques because it is able to achieve T(k) values almost equal to 1.

Fig. 13
figure 13

Trustworthiness metric of various spectral datasets embeddings (a) Coffee spectra (b) Fresh meat spectra (c) Olive oil spectra (d) Fresh purees spectra (e) COVID-19 spectra

Conclusion

In this study, we presented a dimensionality reduction approach for spectroscopy spectra achieved using graph-based neural network embeddings. The spectral data collected from various sources and applications are of high dimensional nature. The classification performance of such spectra can be enhanced by effectively reducing the dimensions. The proposed technique is implemented on five different types of spectroscopy data, and its results are compared with existing prominent DR techniques. The 2D visualizations of the spectral datasets using our approach have shown a competitive and better LDS visualization of HDS. Nonlinearity present in the data is handled efficaciously using a nonlinear activation function; as a result, all the performance metrics of the classification task, including accuracy and MCC, have been remarkably improved. The multi-classification task of spectra has shown slightly better outcomes in comparison with the binary classification. A trustworthiness metric value of almost 1 proves that the HDS features of spectral observations are finely preserved in the latent space. Further access and availability to more spectral data points, especially in medical subdomains, can be an advantage in training a reliable model. Novel nonlinear activation functions can be explored in future to manage the high dimensional and nonlinear spectra more efficiently.