1 Introduction

Deep learning methods [37] are transforming the way we approach many engineering and biomedical areas both for image and signal processing [2, 25], recognition of multi-dimensional signal components, and their classification [34]. In particular, artificial intelligence and convolutional neural networks offer important perspectives for the analysis of medical data [8, 39, 41].

When it comes to stomatology, machine learning methods [11] are used for tasks including automatic spatial detection of teeth positioning [3, 5, 9] during oral operations and analysis of periapical radiographs. Specific studies analyze the contrast between sound and lesion areas on occlusal surfaces for different wavelengths of the near-infrared light [23]. Early success has demonstrated that selected computational methods also allow improvements over the standard visual diagnostics. The associated applications include the detection of the dental plaque causing many oral diseases [6, 22] and the analysis of dental caries that are characterized by demineralization of hydroxyapatite crystals and destruction of collagen matter in dental tissues [20].

The need for the early detection of this pathological situation motivates the study of different methods for the examination of tissues, including spectroscopy, radiography, and conductance measurement. Raman spectroscopy [12], different mathematical methods, and neural network models are extensively studied in this connection for the analysis of recorded signals and images as well. The classification of data segments of biomedical signals may be performed by classical clustering methods [4, 32, 33] or by deep learning algorithms [7, 24, 36]. Standard clustering and classification methods, including decision trees (DT), the k-nearest neighbors method (kNM), support vector machine (SVM), Bayesian methods, and two-layer neural network (NN) algorithms, have been successfully used in this area [14, 19, 31] to detect different kinds of disorders.

Deep learning methods based upon multilayer neural networks are often trained over an extensive number of signals. However, the further problems that need to be solved prior to their efficient use include the selection of data sets such that the individual classes have similar sizes [40]. Problems of class imbalance and associated classification issues [21] are often reduced by data augmentation methods to enlarge the extent of the training data sets. This approach was tested also in stomatology [17, 18] for analysis of dental tissue changes and their classification by convolutional neural networks.

Fig. 1
figure 1

Principle of diffuse reflectivity spectroscopy: a block diagram of the reflectivity recording; b data processing

The present paper is devoted to the classification of reflectivity data recorded for healthy dental tissues and caries, as outlined in Fig. 1. This approach is often applied as a tool for surface defect analysis [42] and the detection of dental disorders [3, 4]. The goal of the present study is to compare deep learning and standard classification methods in this area, so as to establish the extent to which artificial intelligence methods can be reliably used in early diagnosis of dental disorders. This promises to pave the path for the improvement in treatment outcomes and a reduction in the need for invasive procedures.

2 Methods

2.1 Data acquisition

Measurement of the interaction of the light in biological tissues is one type of optical spectroscopy that can be used for the noninvasive or minimally invasive evaluation of tissue pathology [1, 16]. The fundamental blocks of such systems are presented in Fig. 1a. In principle, it is an optical spectroscopy technique whereby a broadband white light is used to illuminate a tissue through an optical fiber probe. The reflected light is captured by the same or separate probe after being subjected to absorption and scattering in the selected tissue area. A multispectral detection system then records the reflection for different wavelengths during each experiment.

Fig. 2
figure 2

Data classification including: a, b the training and testing sets of reflectivity curves of healthy tissues (class A) and tissues with caries (class B), c classification systems using an artificial neural network with five layers based upon deep learning using specific target values for each pattern vector, and d the validation process providing probabilities of estimated classes

Spectroscopy systems in stomatology use a commonly accepted scoring system [27] that is based on the International Caries Detection and Assessment System (ICDAS). The experiments presented here included 741 measurements of healthy and unhealthy tissues, reduced by 6.9% unreliable data to 690 records. This process eliminated data whose difference from the mean curve for each class exceeded the selected threshold. The scoring system used by an experienced stomatologist then associated each measurement \(j=1,2,\ldots ,Q\) out of \(Q=690\) with either class A (422 healthy tissues) or class B (268 caries), with specific probability values. All measurements were randomly divided into two parts that included:

  1. 1.

    The training set (Fig. 2a) of 90% (621) of the observations belonging to classes A (380) and B (241), respectively, for constructing the classification model,

  2. 2.

    The testing set (Fig. 2b) of 10% (69) of observations belonging to classes A (42) and B (27), respectively, for testing the classification model.

The selection of the training and testing sets was performed randomly, to compare the classification accuracy for different groups of observations.

Figure 2a presents the training sets of observations belonging to classes A and B, together with their mean values, which show typical reflectivity plots for healthy tissues and those with dental caries. The shape of these curves is in agreement with other studies [38], pointing out that spectral components less than 570 nm (the blue-green area) are marginal for healthy tissues only.

For each experiment of a specific class, the reflectivity was evaluated and the associated features were estimated. The whole set of experiments was then analyzed by statistical tools and selected classification methods.

2.2 Feature extraction and classification

The data processing presented in Fig. 1b included the analysis of the individual records of reflectivity, evaluation of their mean values, and specification of typical reflectivity curves for each class, as presented in Fig. 2a. Data resampling was then employed, so as to have values on the regular grid of the independent variable. The standard deviation of the differences between each measurement and these typical curves was then used as a measure to extract records affected by gross measurement errors.

The pattern matrix used for the following classification was next defined for the training set, with 90% of the measured values randomly chosen, and the following construction of the pattern matrix columns standing for separate observations and pattern vectors specified by:

  • All values of each reflectivity curve inside the selected wavelength range \(\langle 450, 1550 \rangle \) nm with the sampling period of 1 nm (used for the following deep learning process),

  • Selected features associated with each reflectivity curve estimated both in the time and frequency domains (for the following standard classification).

Target values associated with each column of the pattern matrix were specified by an experienced stomatologist in both cases.

Fig. 3
figure 3

Reflectivity spectrum polynomial approximation (RSA) presenting: a, d a selected reflectivity spectrum (RS) record and its approximation by a polynomial of degree 5, b, e the set of approximate values, and c, f differences between reflectivity values and their approximation for class A and B respectively

The polynomial approximation of each reflectivity curve \(\{s_j(n)\}_{n=0}^{N-1}\) of the selected degree was then applied to each experiment \(j=1,2,\ldots ,Q\) with results presented in Fig. 3a, d for selected sets of each class. Figure 3b, e present these approximations \(\{f_j(n)\}_{n=0}^{N-1}\) for all measurements with differences \(\{d_j(n)\}_{n=0}^{N-1}\) between reflectivity values and their approximations presented in Fig. 3c, f.

The features used for standard classification methods included:

  1. 1.

    The wavelength for the maximal value of reflectivity,

  2. 2.

    The standard deviation of differences between reflectivity values and their polynomial approximations of a selected degree \(M=5\).

Spectrograms of each difference signal \(\{d_j(n)\}_{n=0}^{N-1}\) for both classes were then used to evaluate the relative mean power in frequency ranges \(r_1=\langle f_{c1}, f_{c2} \rangle \) and \(r_2=\langle f_{c3}, f_{c4} \rangle \) that formed spectral domain features for the following classification.

The features of each difference signal \(\{d_j(n)\}_{n=0}^{N-1}\) for \(j=1,2,\ldots ,Q\) were evaluated by the discrete Fourier transform as the relative power p(ij) estimated both in the frequency range \(r_1\) and \(r_2\) using the following relations:

$$\begin{aligned} p(i,j)\!=\!\frac{\sum _{k \in \Phi _i} \left| D_j(k) \right| ^2}{\sum _{k=0}^{N/2} \left| D_j(k) \right| ^2}, D_j(k)\!=\!\sum _{n=0}^{N-1} d_j(n)\;e^{-j\,kn\frac{2\pi }{N}} \end{aligned}$$
(1)

where \(\Phi _i\) is the set of indices for the range of frequencies \(f_k\! \in r_i \), \(i=1,2\), and \(j=1,2,\ldots ,Q\) for the total number of experiments Q. The pattern matrix \(\mathbf{P}_{2,Q}\) is defined in this way.

Figure 4 presents the distribution of selected couples of features used for the following classification into two classes in the signal and spectral domains. Figure 4a presents the standard deviation of differences between signals and their polynomial approximations versus the wavelength of the maximal value of the reflectivity curve. The distribution of the features evaluated in the spectral domain is presented in Fig. 4b. It presents the relative power of the differences between the signals and their approximations in two selected frequency ranges. The centers of the individual classes and c-multiples of standard deviations for \(c=0.2, 0.5\), and 1 are presented in both cases as well.

Fig. 4
figure 4

The distribution of features for classification into two classes using: a the standard deviation of differences between signals and their polynomial approximations versus the wavelength of the maximal value of the reflectivity curve and b the relative power of differences between signals and their approximations in two selected frequency ranges with centers of individual classes and c-multiples of standard deviations for \(c=0.2, 0.5\) and 1

2.3 Pattern recognition

A supervised learning strategy [10, 28] was used for classification into two classes specified in the target vector \(\mathbf{T}_{S,Q}\) to separate the individual signals into \(S=2\) categories, namely into class A (healthy tissues), and class B (tissues with caries). These classes were defined by an experienced stomatologist for the learning set of signals. To initialize the classification process, the pattern matrix \(\mathbf{P}_{R,Q}\) was constructed with each its column \(j=1,2,\ldots ,Q\) specifying an individual signal in the learning stage.

In the case of the deep learning strategy, no specific features were evaluated and each feature vector associated with signal \(j=1,2,\ldots ,Q\) included all signal values or their transform into the spectral domain. The number R of pattern matrix rows corresponded with the number of signal or spectral values. This process is presented in Fig. 2. The observed signal set was randomly divided into two groups: the first one used for training and the second one for testing, as presented in Fig 2a, b. The classification system presented in Fig. 2c was formed by five layers: the input layer, bidirectional long short-term memory (LSTM) [26] layer, fully connected layer, softmax layer, and the classification layer [13]. The optimization of model parameters was performed by the ADAM method [15].

To compare the results of the deep learning framework with the standard classification methods, two features were evaluated for each signal and the matrix \(\mathbf{P}_{2,Q}\) specified by Eq. (1) was constructed. This kind of information compression enabled both the graphical presentation of feature clusters in the two-dimensional space and the use of standard classification methods. These included the support vector machine, a Bayesian method, the k-nearest neighbor method, and a two-layer neural network. The accuracies and k-fold cross-validation errors of the final models were then used for evaluating the selected classification methods.

The standard classification models used the same set of signals for training of the individual models. The two-layer neural network system was formed by the simplified system presented in Fig. 2c: (1) a fully connected layer of a selected number of elements with the sigmoidal transfer function and (2) the classification layer. The pattern matrix \(\mathbf{P}_{R,Q}\) formed the input for the two-layer neural network to evaluate the values using the following relation:

$$\begin{aligned} \mathbf{A}_{S,Q}=f2(\mathbf{W2}\times f1(\mathbf{W1}\times \mathbf{P}, \mathbf{b1}), \mathbf{b2}) \end{aligned}$$
(2)

The network coefficients included the elements of the matrices \(\mathbf{W1}_{S1,R}\), \(\mathbf{W2}_{S,S1}\) and vectors \(\mathbf{b1}_{S1,1}\), \(\mathbf{b2}_{S,1}\). The classification models used the sigmoidal transfer function f1 and the softmax transfer function f2 in the first and the second layer, respectively [35].

The number of output elements corresponded with the number S of classes. The values in each column \(j=1,2,\ldots ,Q\) of the output matrix \(\mathbf{A}_{S,Q}\) defined the predicted probabilities of the belongingness to each class. The row number with the highest probability pointed to the estimated class number for each pattern vector.

During the learning process, the network coefficients were optimized in separate learning epochs and the performance of the classification models was evaluated by the log-loss function, which takes into account the probability that is assigned to the estimation of the target value.

The classification results obtained after the selected number of training epochs were then analyzed by the receiver operating characteristic (ROC) [29, 30]. The learning process evaluated the number of true/false negatives (TN, FN) and true/false positives (TP, FP) both in the negative set (class A: healthy tissue) and positive set (class B: caries). These measures formed the confusion matrix:

$$\begin{aligned} \mathbf{CM}=\left[ \begin{array}{lll} &{} \mathrm{TN} &{} \mathrm{FN}\\ &{} \mathrm{FP} &{} \mathrm{TP}\\ \end{array}\right] \end{aligned}$$
(3)

The associated performance metrics included:

  • The true rate (A-negative, B-positive)

    $$\begin{aligned} \mathrm{TR}(\mathrm{A})= \frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FP}}, \mathrm{TR}(\mathrm{B})= \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}} \end{aligned}$$
    (4)
  • The predictive value (A-negative, B-positive)

    $$\begin{aligned} \mathrm{PV}(\mathrm{A})= \frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FN}}, \mathrm{PV}(\mathrm{B}) = \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}} \end{aligned}$$
    (5)
  • The accuracy value

    $$\begin{aligned} \mathrm{ACC}= \frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}} \end{aligned}$$
    (6)

The k-fold cross-validation method was then employed as a measure of the generalization capabilities of each neural network model.

3 Results

The classification of the reflectivity data of dental tissues into two classes (A: healthy, B: caries) was performed using original data sets acquired in the stomatologic laboratory by the second author. Figure 5 shows the results of the deep learning method, presenting the training accuracy and the loss for different randomly selected initializations of the whole model over ten experiments. The goal of the process was the optimization of the network to maximize the accuracy and to minimize the loss during individual epochs. This optimization was performed by the adaptive moment estimation (ADAM) method [15] in the mathematical and software environment of Matlab2021a.

Fig. 5
figure 5

Results of the deep learning-based classification: training accuracy and loss values for feature matrix constructed by column vectors of a individual signals and b spectral values of their differences related to the polynomial approximation of the fifth degree over 300 training epochs for different selections of training subsets and initial conditions

Fig. 6
figure 6

Incremental values of the accuracy and loss in the subsequent learning epochs enabling improving the behavior of the deep neural network used for classification into two classes with randomly selected training system initializations and average values of the accuracy and loss in each stage

The optimization process was performed for the deep neural network structure presented in Fig. 2c. The proposed model was constructed as a five-layer neural network for the LSTM layer with 100 units followed by the fully connected layer with two outputs. The initial training rate and gradient threshold were 0.002 and 1, respectively. The optimization process was performed in the proposed incremental mode that allowed modifying the structure and network coefficients during the learning stage. Figure 6 presents the incremental values of the accuracy and the loss in the subsequent learning stages together with their mean values over all values in each stage. This process enabled improving the behavior of the deep neural network used for classification into two classes with the randomly selected training system initializations.

Table 1 presents the accuracy ACC and the loss value LV for ten deep learning experiments and different network initializations presenting the final value (F) after 300 learning epochs and its mean (F10) evaluated from its ten last values. The training was performed for the training set of values presented in Fig. 2a and verification for the different set in Fig. 2b of 10% of experiments randomly chosen from all observations. The mean value of the accuracy and the loss evaluated from the last 10 epochs were 97.96% and 0.06, respectively.

Table 1 The final accuracy ACC (%) and the loss value LV of 10 deep learning experiments after 300 learning epochs for different network initializations presenting final (F) and mean values (F10) evaluated from ten last epochs
Table 2 Signal domain features processing: confusion matrix of the classification by the deep learning (DL) neural network model for the training and testing sets with true positive values on the matrix diagonal (in the bold), true positive/negative rates \(\mathrm{TR}(k)\), and positive/negative prediction values \(\mathrm{PV}(k)\)

Tables 2 and 3 present the confusion matrix of the classification by the deep learning neural network model for the training and testing sets and the selected experiment. The results based on the signal domain features presented in Table 2 show higher precision both for class A and class B. The sensitivity and specificity are higher for the signal domain pattern matrix and the final accuracy for signal and spectral domain features is 98.1% and 94.4%, respectively.

Fig. 7
figure 7

Classification of healthy tissues and caries for two features evaluated as a, b the wavelength of the maximal reflectivity value and the standard deviation of the differences of reflectivity values and their approximation and c, d the relative power of the difference signal in two frequency bands using a, c the SVM machine and b, d the two-layer neural network

Table 3 Spectral domain features processing: Confusion matrix of the classification by the deep learning (DL) neural network model for the training and testing sets with true positive values on the matrix diagonal (in the bold), true positive/negative rates \(\mathrm{TR}(k)\), and positive/negative prediction values \(\mathrm{PV}(k)\)

The classification results achieved by the deep neural network were compared with those achieved by the SVM machine, Bayesian method, and the two layer neural network with the sigmoidal and softmax transfer function. Figure 7 presents the classification into two classes (healthy tissues and caries) for two features evaluated as the wavelength of the maximal reflectivity value and the standard deviation of the reflectivity values in the decreasing part of reflectivity using selected classification methods with accuracy (ACC) and cross-validation (CV) errors with class boundaries.

A comparison of the accuracy and cross-validation errors for the classification of stomatological data into two classes by standard methods (support vector, Bayesian, k-nearest neighbor, the two-layer neural network), and the five-layer deep neural network after 300 learning epochs is presented in Table 4. For the given data set, the results of the classifications by the standard classification methods both for the signal and spectral domain features were very close. The classification accuracy for the deep learning systems is 98.1% and 94.4% for data in the signal and spectral domains, respectively, with the loss criterion below 0.01 in both cases. The efficiency of the deep learning is much higher for the spectral domain features with not so well-separated clusters.

Table 4 Accuracy (ACC) and Tenfold cross-validation (CV) errors for classification of stomatological data into two classes by the support vector (SVM), Bayesian method, seven-nearest neighbor method (NM), the two-layer neural network (NN), and the proposed five-layer deep neural network (in bold)

4 Conclusion

This paper investigated the effectiveness of selected mathematical methods for the classification of reflectivity data recorded for a total number of 741 dental tissues in the stomatological laboratory. The classification into two categories (healthy tissues and caries) was performed by a deep learning neural network and the ADAM method for model parameter optimization. Its results were compared with classifications by standard methods, which included the support vector machine, a Bayesian method, k-nearest neighbors, and a two-layer neural network. For the deep learning method, no features were evaluated and the complete records formed the input into the classification system. By contrast, in the standard approach to classification based on pattern recognition, feature vectors were estimated from reflectivity data preprocessed by the proposed polynomial approximation.

The results have shown that the deep learning neural network system was able to distinguish caries from healthy tissues without any preprocessing or selection of features, with better accuracy than the standard methods. The classification accuracy was 98.1% and 94.4% for the deep learning system using signal and spectral domain features, respectively. The loss criterion was below 0.01 in both cases.

The deep learning system for signal segment classification has been able to eliminate the process of selecting signal features, which can be sometimes rather complicated and time-consuming. On the other hand, the advantage of the evaluation of features can lie in the deeper understanding of the signal properties both in the time and frequency domain and the possibilities of their visualization. The combination of both approaches to data processing can bring their benefits to obtain a deeper understanding of the physical background of the studied system.

It is expected that further research will be devoted to the deep learning and standard classification systems to benefit from their specific advantages. In the case of deep learning systems, with their complicated structures and with the possibilities of an incremental learning strategy, it will be possible to study the effects of varying the system structure and its coefficients during the learning process. Further studies of the distribution of specific features will enable a deeper data analysis and the use in the clinical environment. In both cases, new machine learning strategies and their implementation for real biomedical data processing and for monitoring physiological functions will be studied as well.