1 Introduction

Periodontal infections and dental caries are prevalent, non-transmissible conditions that are prior reasons for tooth loss. Typically, dentists are responsible for appropriately managing these concerns, while more intricate instances are referred to specialized practitioners in other dental disciplines, including endodontics, oral maxillofacial surgery, periodontics, prosthodontics, orthodontics and restorative dentistry. The timely identification and management of dental conditions can yield improved treatment results, including halting disease development, promoting overall well-being and societal functioning, and reducing the risk of tooth extraction. Deep learning-based studies are playing a crucial role in the field of teeth categorization research, therefore assisting dentists in gaining a more accurate understanding of disease types, enhancing diagnostic capabilities, and facilitating the development of more effective treatment methods.

Intraoral X-rays [1] in digital dental radiography [2] are distinguished from panoramic X-rays [3], cephalometric X-rays [4] and cone beam computed tomography (CBCT) [5] regarding the prevalence of use for various dental treatment phases in oral healthcare. Periapical and bitewing images [6] are intraoral X-rays that provide information on bone and surrounding oral tissue containing three to four teeth in a single X-ray image.

For improving dental disease diagnosis and treatment plans, deep learning [7] has significantly influenced the processing of intraoral X-ray images in image processing [8], segmentation [9,10,11,12] and enhancements [13,14,15]. These developments integrating intraoral X-ray imaging with deep learning techniques enhance the precision of oral health condition recognition and detection, such as dental caries [16,17,18,19,20,21,22], implant [23], oriented tooth [24], restoration by filling [25] and dental material [26]. This study adapts deep learning techniques on a novel dataset to recognize and detect twenty categories as abscessed teeth, calculus, caries, cysts, dental bridges, dental crowns, extracted teeth, filling overhang, impacted wisdom teeth, implants, mesialized dentition, mixed dentition, periodontal bone loss, pulpitis, restoration by filling, retained root, screw-retained restoration, single-root canal treatment, two-root canal treatment and three-root canal treatment.

Common Vector Approach (CVA) is a pooling technique to enhance the accuracy of the deep learning models and to overcome limitations associated with average pooling in deep learning [27,28,29,30,31,32]. The CVA concept is derived from Principal Component Analysis (PCA), which projects onto eigenvectors with the smallest eigenvalues for extracting more informative and discriminant features. Using CVApool, the outlier features in a batch of data could be eliminated by utilizing the acquired difference and common vector for that data batch for providing a meaningful representation of feature vectors.

This study adapts deep learning techniques on a novel dataset to recognize and detect twenty classes as abscessed teeth, calculus, caries, cysts, dental bridges, dental crowns, extracted teeth, filling overhang, impacted wisdom teeth, implants, mixed dentition, periodontal bone loss, pulpitis, restoration by filling, retained root, screw-retained restoration, single-root canal treatment, two-root canal treatment and three-root canal treatment. In this study, a deep convolutional neural network (CNN) model is developed using CVApool to enhance the accuracy of diagnosing dental diseases using intraoral X-ray. Initially, CNN was trained to automatically assess the healthy state of X-ray images, categorizing them based on a blind test set of teeth samples. Subsequently, the CVApool layer was developed, to reduce dimensionality by coupling the Gram–Schmidt Orthogonalization technique with the CVA method. The CVApool was applied to each data batch (matrix), where each column vector corresponds to a unique feature vector. Finally, several experiments conducted using various deep learning models like EfficientNetB1, B2, B3 for evaluating the performance of the CVApool.

The sections of this paper are organized as follows: Sect. 2 explains the related studies. Section 3 provides an introduction to the CVApool technique and outlines the steps taken to implement the approach on a CNN model. Section 4 presents a comprehensive summary of the proposed method’s performance across various datasets based on objective evaluations. The concluding part of this article discusses the implications of the findings.

2 Related work

After the digitalization of dental imaging, researchers have been interested in developing AI algorithms for dental condition classification based on intraoral X-ray images. One of the studies is the development of a new Dental Diagnosis System based on feature extraction, segmentation, classification and decision-making processes for the classification of dental conditions, including root fracture, teeth, decay, missing teeth and resorption of periodontal bone with an accuracy of 92.74% [33]. In another study, geometric features of dental images are extracted to detect dental caries with a detection precision of 97% [34]. Geetha et al. developed a diagnostic system using statistical feature extraction and a back-propagation neural network for detecting dental caries in digital radiographs and achieved an accuracy of 97.1% [35].

In recent studies, researchers have focused on validating the efficiency and accuracy of deep learning algorithms for analyzing intraoral X-ray images. After the development of CNNs, deep learning architectures for dental imaging evolved as the variations in CNNs, such as VGGNet [36], GoogLeNet [21], AlexNet [37], EfficientNet [38] and DenseNet [39]. For example, Lee et al. used a pretrained GoogLeNet Inceptionv3 CNN network with transfer learning and achieved high diagnostic accuracies for detecting dental caries in radiographic images [21]. Vasdev et al. compared Res-Net18, Res-Net34 and AlexNet for classifying caries, root canal treatment, abscess, bone loss and missing teeth over 16,000 dental images with an accuracy of up to 85.2%. Chen et al. combined EfficientNetB0 and fully connected layers for periodontitis and dental caries recognition and achieved a precision of 97.1%.

Deep learning models automatically extract relevant features and classify dental images based on the labels of X-ray images. Several studies classify dental abnormalities based on deep learning models. Imak et al. developed a multi-input deep convolutional neural network ensemble model using machine learning and image processing techniques and achieved an accuracy score of 99.13% for diagnosing dental caries [17]. Zhu et al. developed a diagnosis method to predict and localize caries lesions on periapical X-ray images based on artificial intelligence and the Faster R-CNN method and achieved a precision of 73.49% [22]. Kohlakala et al. trained a fully convolutional network (FCN) on the generated X-ray images to identify dental implant connections and achieved a classification accuracy of 71.7% [23]. Leo and Reddy categorized caries lesions based on their type and severity by combining Artificial Neural Network (ANN) and Deep Neural Network (DNN) techniques under the name of Hybrid Neural Network (HNN) and achieved an accuracy of 96% on the classification of caries levels [8]. Aparna et al. text estimated the level of filling in teeth on X-ray images using Mask R-CNN with ResNet50 architecture [25]. Singh and Sehgal developed a deep convolution layer network (CNN) with a Long Short-Term Memory (LSTM) model for detection and diagnosis of dental caries.

Convolutional neural networks (CNNs) are a type of artificial neural network that has been specifically developed to handle signals, sequences, images or volumetric data. CNNs have demonstrated their efficacy in many tasks, including but not limited to fault analysis [40, 41], image segmentation and image recognition. In traditional machine learning methodologies, the process of feature extraction or engineering is necessary in order to get precise predictions. Nevertheless, the success of feature-driven algorithms is dependent upon the characteristics of the data and is also linked to substantial computing expenses when handling huge datasets. Deep learning models have the ability to integrate raw data into several layers of intermediate features throughout the training process. The outcome is not dependent upon human interaction. Meanwhile, the generalization capacity of traditional methods is restricted, but deep learning models exhibit unprecedented generalization performance. Despite the need for a substantial quantity of unlabeled data, high-performance hardware and training time, CNN-based algorithms exhibit superior cognitive processing capabilities and outperform commonly used machine learning methods. By considering such facts, we employed a new CNN pooling layer for teeth disease classification.

3 Proposed pooling layer

3.1 Definition and contributions of CVA

The subspace-based recognition approach known as CVA has demonstrated favorable outcomes across various applications and classification challenges. The Common Vector Approach (CVA) concept is derived from the underlying principles of Principal Component Analysis (PCA). Principal Component Analysis (PCA) involves projecting the data onto the eigenvector that corresponds to the largest eigenvalues. Conversely, the CVA technique reverses this process, projecting the data onto the eigenvector associated with the fewest eigenvalues. It has been shown that the eigenvectors associated with the least (or zero) eigenvalues are the most representative of the class that exhibit the shared properties of vectors without any variances. There is a strong relationship between common vector and the indifference subspace (null-space) and the zero principal components. So, one can say that the common vector is a special vector that describes the characteristics that are shared by all instances of a class.

The definition of the common vector is not complex in terms of implementation. Suppose that we are given m samples corresponding to a processed class, \(\left\{ {{\mathbf{a}}_{{\text{i}}} } \right\}{\text{i}} = 1,2, \ldots ,{\text{m}}\)). It is now possible to denote each \({\mathbf{a}}_{{\mathbf{i}}}\) vector as the sum of \({\mathbf{a}}_{{\text{i}}} = {\mathbf{a}}_{{{\text{com}}}} + {\mathbf{a}}_{{\text{i,diff}}}\). To put it another way, a common vector, or \({\mathbf{a}}_{{{\mathbf{com}}}}\), is what is left after the difference vectors of class members are subtracted, and it is constant across the whole class. On the other hand, the \({\mathbf{a}}_{{{\mathbf{i}},{\mathbf{diff}}}}\) vector is referred to as the remaining vector, and it is the one that indicates the unique residual trend of this specific sample.

The vector encompassing the attributes that remain unchanged is called the common vector. The CVA method provides an effective means of dimensionality reduction. Depending on the vector size (n) and the number of samples (m) in each class, two scenarios might arise during the extraction of the common vector. If n > m, we get the so-called sufficient case, where the number of observations is larger than the number of features. The second is the insufficient case (n ≤ m). The derivation of a solution for a so-called insufficient data situation is the key benefit of CVA compared to other subspace-based classification techniques. When the output of a standard CNN technique is taken into consideration, it is possible to observe that the length of a vector is greater than the number of vectors in a class. As a result, this case is an excellent illustration of the insufficient data situation since it indicates that the length of a vector is greater than the number of vectors in CNN output.

In mathematical terms, the common vector may be determined by subtracting the average vector from the sum of the projection of the average vector onto the orthonormal basis. This is another way to get the common vector. We may thus state that the common vector differs from the mean vector. A numerical demonstration is provided, and the difference in performance is compared with a straightforward example, in an effort to clarify how the contribution of the common vector compares to that of the average vector. Since the output of CNN is dealt with in the form of a vector format, the insufficient data situation has been the primary focus of our investigation.

Let’s assume that we are given m samples. The ai shows one of training vector of batch data. The acom indicates the common vector of processed CNN output for single batch. Finally, zi is orthonormal basis returned from Gram–Schmidt Orthogonalization. Thus, the following expressions apply to all training set vectors:

$$\begin{aligned} {\mathbf{a}}_{1} = {\mathbf{a}}_{{{\mathbf{com}}}} + \left\langle {{\mathbf{a}}_{1} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1} &+ \left\langle {{\mathbf{a}}_{1} ,{\mathbf{z}}_{2} } \right\rangle {\mathbf{z}}_{2} + \ldots + \left\langle {{\mathbf{a}}_{1} ,{\mathbf{z}}_{{{\mathbf{m}} - 1}} } \right\rangle {\mathbf{z}}_{{{\mathbf{m}} - 1}} \\ {\mathbf{a}}_{2} = {\mathbf{a}}_{{{\mathbf{com}}}} + \left\langle {{\mathbf{a}}_{2} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1}& + \left\langle {{\mathbf{a}}_{2} ,{\mathbf{z}}_{2} } \right\rangle {\mathbf{z}}_{2} + \ldots + \left\langle {{\mathbf{a}}_{2} ,{\mathbf{z}}_{{{\mathbf{m}} - 1}} } \right\rangle {\mathbf{z}}_{{{\mathbf{m}} - 1}} \\ & \qquad \qquad \qquad \ldots \\ {\mathbf{a}}_{{\mathbf{m}}} = {\mathbf{a}}_{{{\mathbf{com}}}} + \left\langle {{\mathbf{a}}_{{\mathbf{m}}} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1} &+ \left\langle {{\mathbf{a}}_{{\mathbf{m}}} ,{\mathbf{z}}_{2} } \right\rangle {\mathbf{z}}_{2} + \ldots + \left\langle {{\mathbf{a}}_{{\mathbf{m}}} ,{\mathbf{z}}_{{{\mathbf{m}} - 1}} } \right\rangle {\mathbf{z}}_{{{\mathbf{m}} - 1}} \\ \end{aligned}$$
(1)

By summing up each side of the preceding equation simultaneously, we get:

$$\begin{aligned} & \sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} } = {\mathbf{m}}\;{\mathbf{a}}_{{{\mathbf{com}}}} + \left\langle {\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} {\mathbf{z}}_{1} } } \right\rangle {\mathbf{z}}_{1} + \left\langle {\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} {\mathbf{z}}_{2} } } \right\rangle {\text{z}}_{2} + \ldots + \left\langle {\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} {\mathbf{z}}_{{{\mathbf{m}} - 1}} } } \right\rangle {\text{z}}_{{{\mathbf{m}} - 1}} \\ & \frac{1}{{\mathbf{m}}}\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} } = {\mathbf{a}}_{{{\mathbf{com}}}} + \left\langle {\frac{1}{{\mathbf{m}}}\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} {\mathbf{z}}_{1} } } \right\rangle {\mathbf{z}}_{1} + \left\langle {\frac{1}{{\mathbf{m}}}\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} {\mathbf{z}}_{2} } } \right\rangle {\mathbf{z}}_{2} + \ldots + \left\langle {\frac{1}{{\mathbf{m}}}\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} {\mathbf{z}}_{{{\mathbf{m}} - 1}} } } \right\rangle {\mathbf{z}}_{{{\mathbf{m}} - 1}} \quad {\text{and}} \\ & {\mathbf{a}}_{{{\mathbf{com}}}} = {\mathbf{a}}_{{{\mathbf{ave}}}} - \left\langle {{\mathbf{a}}_{{{\mathbf{ave}}}} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1} - \left\langle {{\mathbf{a}}_{{{\mathbf{ave}}}} ,{\mathbf{z}}_{2} } \right\rangle {\mathbf{z}}_{2} - \ldots - \left\langle {{\mathbf{a}}_{{{\mathbf{ave}}}} ,{\mathbf{z}}_{{{\mathbf{m}} - 1}} } \right\rangle {\mathbf{z}}_{{{\mathbf{m}} - 1}} \\ \end{aligned}$$
(2)

The common vector may be found by subtracting the average vector from its projection onto the complete orthonormal basis, as we can see above. We provide three very basic training vectors in order to analyze the behaviors of an average vector and common vector based on a toy example. Let’s suppose that the basic three vectors are a1 = [1 1 1]T, a2 = [1 1 -1]T and a3 = [1 5 5]T. These vectors can be normalized by using sigmoid activation function. Then, the vectors become a1 = [0.7311 0.7311 0.7311]T, a2 = [0.7311 0.7311 0.2689]T and a3 = [0.7311 0.9933 0.9933]T. The common vector of these three vectors can be obtained by constructing difference subspace (b1 and b2) by setting a1 as reference vector:

$${\mathbf{b}}_{{\mathbf{1}}} = \left[ {0\;0\; - 0.4621} \right]^{{\text{T}}} \quad {\text{and}}\quad {\mathbf{b}}_{{\mathbf{2}}} = \left[ {0\;0.2622\;0.2622} \right]^{{\text{T}}}$$
(3)

After that, we will continue on to the Gram–Schmidt Orthogonalization, and the following listed rules that will be used to obtain the common vector for the training set:

$${\mathbf{d}}_{1} = {\mathbf{b}}_{1} ,\quad {\mathbf{z}}_{1} = \frac{{{\mathbf{b}}_{1} }}{{\left\| {{\mathbf{b}}_{1} } \right\|}} = \left[ {0\;0\;1} \right]^{{\text{T}}}$$
(4)
$${\mathbf{d}}_{2} = {\mathbf{b}}_{2} - \left\langle {{\mathbf{b}}_{2} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1} = \left[ {0\;0.2622\;0} \right]^{{\text{T}}} ,\quad {\mathbf{z}}_{2} = \frac{{{\mathbf{d}}_{2} }}{{\left\| {{\mathbf{d}}_{2} } \right\|}} = \left[ {0\;1\;0} \right]^{{\text{T}}}$$
(5)

(z1, z2, …, z(k-1)) refers the orthonormal basis, and (b1, b2, …, b(k-1)) denotes the orthogonal vectors after normalization. As a result, the summation of the projections of a1 onto the orthonormal basis of the difference subspace B, also known as asum, may be computed in the following manner:

$${\mathbf{a}}_{{{\mathbf{sum}}}} = \left\langle {{\mathbf{a}}_{1} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1} + \left\langle {{\mathbf{a}}_{1} ,{\mathbf{z}}_{2} } \right\rangle {\mathbf{z}}_{2} = \left[ {0\;0.7311\;0.7311} \right]^{{\text{T}}}$$
(6)

In conclusion, we may get the common vector by first subtracting asum from either the reference vector or the average vector. By utilizing the reference vector (a1), it is possible to determine the common vector, as shown in Eq. (7).

$${\mathbf{a}}_{{{\mathbf{sum}}}} = {\mathbf{a}}_{{\mathbf{1}}} - {\mathbf{a}}_{{{\mathbf{sum}}}} = \left[ {0.7311\quad 0\quad 0} \right]^{{\text{T}}}$$
(7)

When working with the same list of vectors, one can acquire the common vector by first calculating the average vector. Let’s start by defining what the average vector as follows:

$${\mathbf{a}}_{{{\mathbf{ave}}}} = \sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} } = \left[ {0.7311\quad 0.8185\quad 0.6644} \right]^{{\text{T}}}$$
(8)

Then, we will project the aave onto the orthonormal basis:

$${\mathbf{a}}_{{{\mathbf{ave}},{\mathbf{sum}}}} = \left\langle {{\mathbf{a}}_{{{\mathbf{ave}}}} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1} + \left\langle {{\mathbf{a}}_{{{\mathbf{ave}}}} ,{\mathbf{z}}_{2} } \right\rangle {\mathbf{z}}_{2} + \ldots + \left\langle {{\mathbf{a}}_{{{\mathbf{ave}}}} ,{\mathbf{z}}_{{{\mathbf{m}} - 1}} } \right\rangle {\mathbf{z}}_{{{\mathbf{m}} - 1}} = \left[ {0\quad 0.8185\quad 0.6644} \right]^{{\text{T}}}$$
(9)

As a result, the following would be the common vector:

$${\mathbf{a}}_{{{\mathbf{com}}}} = {\mathbf{a}}_{{{\mathbf{ave}}}} - {\mathbf{a}}_{{{\mathbf{ave}},{\mathbf{sum}}}} = \left[ {0.7311\quad 0\quad 0} \right]^{{\text{T}}} ,$$
(10)

It is exactly the same vector that is discovered by Eq. 7. Nevertheless, if we compute the magnitudes of the differences that exist between the test sample, the average vector and the common vector, we will see that there is significantly different margin for classification:

$$Dist_{1}^{ave} = \left\| {{\mathbf{test}} - {\mathbf{a}}_{{{\mathbf{ave}}}} } \right\|^{2} = 0.1862,\quad Dist_{1}^{com} = \left\| {{\text{test}} - {\mathbf{a}}_{{{\mathbf{com}}}} } \right\|^{2} = 0.8857$$
(11)

These contrasting distance results also suggest that the common vector and the average vector perform radically differently in terms of their ability to classify data. The generated Dist values are used as a reference point in order to compare and contrast the capabilities of an average vector and a common vector. We would like to hope that the achieved distance will be as great as it is feasible to be in order to get the highest potential performance. It is very clear that \(Dist_{1}^{com}\) is a substantially produced bigger margin, whereas \(Dist_{1}^{ave}\) is really little. When contrasted with an experiment of the average vector, the toy example demonstrates that the CVA has the potential to enhance the margin between test samples, common of class vectors.

3.2 Using CVA as pooling layer

The CVA approach is utilized as a pooling process to provide a meaningful and uniform representation of the feature vectors. To provide further clarification, the common vector of a column representation of the k feature vectors can be obtained in one of two ways: first, by employing an eigenspace model that is made up of eigenvectors that correspond to the smallest eigenvalues, and second, by utilizing the Gram–Schmid orthogonalization procedure in order to obtain the orthonormal vectors of the processed data if there are insufficient data. We have focused on the Gram–Schmid orthogonalization process rather than generating the orthonormal vectors since finding the eigenvector for a big data dimension demands a significant amount of computer memory.

The PCA approach suffers from a number of drawbacks, the most notable of which are the high computing cost for eigenvector decompositions, the need to standardize the data and the loss of information. We have developed a CVA approach to create an accurate distance map between the test samples and the class samples by considering the negative impacts caused by using pure PCA. During the classification process, it was discovered that the Categorical Cross Entropy Loss offered superior results for running the CVApool with CNN models. Following the rules will supply additional information to show implementation of CVA for a single batch data.

The general stages involved in putting the CVApool concept into action may be summed up as follows. In an example, the CNN produces a batch data with size of b × k × n × n. Let’s consider batch data are (64 × 1280 × 7 × 7). We can represent each batch data by reshaping as (1280 × 49), and we try to find the common vector for each of batch, which would be return as (1 × 1280), since there are 49 vectors. Eventually, we will obtain (64 × 1280) data since the size of batch is 64. The common vector related to the batch data ab,com can be achieved through the following steps.

  • To begin, the difference vector ab,diff is derived by first projecting the reference vector (at) onto the orthonormal basis (z1, z2, …, z(k-1)), which is accomplished by the Gram–Schmidt process.

    $${\mathbf{a}}_{{{\mathbf{b}},{\mathbf{diff}}}} = \left\langle {{\mathbf{a}}_{{\mathbf{t}}} ,{\mathbf{z}}_{1} } \right\rangle + \left\langle {{\mathbf{a}}_{{\mathbf{t}}} ,{\mathbf{z}}_{2} } \right\rangle + \ldots + \left\langle {{\mathbf{a}}_{{\mathbf{t}}} ,{\mathbf{z}}_{{{\text{k}} - 1}} } \right\rangle$$
    (12)
  • Following this, a common vector of batch data may be obtained by subtracting the ab,diff from the reference vector (at), as seen in Eq. (13).

    $${\mathbf{a}}_{{{\mathbf{b}},{\mathbf{com}}}} = {\mathbf{a}}_{{\mathbf{t}}} - {\mathbf{a}}_{{{\mathbf{b}}{,}{\mathbf{diff}}}}$$
    (13)

Figure 1 indicates the overall system for the tooth disease categorization. Experiments were carried out using various deep learning models, including EfficientNetB1 [42], EfficientNetB2, EfficientNetB3 and EfficientNetV2S. The number 32 is used to specify the batch size. Optimizer is specified as Adam [43]. A value of 0.001 is used as the starting point for the learning rate, and a coefficient of 0.3 is used to regulate it. The epoch is 60 and 100 for the pre-trained and non-pre-trained models, respectively. We have used strategies such as random resized crops and flip augmentations to increase the performance of our decisions.

Fig. 1
figure 1

Proposed tooth disease classification system

4 Experimental study

4.1 Dataset

Dental X-ray images are collected from 1887 patients from a dental clinic in Ankara, Turkey, between years 2006 and 2023 using a dental imaging software. 2971 images are included into dataset, and these images are divided into 6 categories and 20 classes as summarized in Table 1. Also, Fig. 2 presents samples of considered teeth types.

Table 1 Experimental dataset
Fig. 2
figure 2

An example X-Ray image of 1. abscessed teeth; 2. calculus; 3. caries; 4. cysts; 5. dental bridges; 6 dental crowns; 7. extracted teeth; 8. filling overhang; 9. impacted wisdom teeth; 10. implant; 11. mesialized dentition; 12. mixed dentition; 13. periodontal bone loss; 14. pulpitis; 15. restoration by filling; 16. retained root; 17. screw-retained restoration; 18. single-root canal treatment; 19. two-root canal treatment; 20. three-root canal treatment

4.2 Performance evaluation on tooth dataset

Dentists are able to propose a range of treatment options for the restoration of damaged teeth through the utilization of accurate decision-making facilitated by artificial intelligence. These treatment options aid in restoring the visually appealing, anatomical and physiological functions of teeth. For this reason, we have conducted various tests using small and big dental datasets.

The initial experiment segmented the entire dataset into seven distinct dental clinical departments. Dental professionals devised the terminology for many departments within the field of dentistry. These departments include orthodontics, endodontics, maxillofacial surgery, prosthodontics, periodontics and restorative dentistry. An investigation has been conducted on the category-wise accuracy and f-measure. Furthermore, the confusion matrices pertaining to each category are presented. The second experiment focused on augmenting the overall quantity of classes to assess the models' ability to replicate the data properly. Consequently, the model's performance was assessed on a comprehensive set of twenty dental problems, encompassing the entirety of diseases encountered in seven dental clinics.

Furthermore, the comparative analysis of the experimental results was conducted by utilizing both pre-trained and untrained convolutional neural network (CNN) models. The evaluation of the CNN models' performance under test settings involved the utilization of objective measures, namely accuracy and f1-score, to assess their correctness. Table 2 displays the discrimination accuracy for seven categories by disregarding the subcategories inside the classes individually.

Table 2 Performance results for seven categories

Also, Table 3 presents the performance scores on 20 various forms of dental illness. These categories are orthodontics, endodontics, maxillofacial surgery, prosthodontics, periodontics, restorative dentistry and other dental professions. Other dental fields include prosthodontics and periodontics. The estimation results of the deep learning architectures EfficientNetB1, EfficientNetB2, EfficientNetB3 and EfficientNetBV2S are presented in Tables 2 and 3, respectively. The presented tables depict the results obtained from the examination of performance metrics. For each pooling layer, we conducted an analysis of the compromises and commutations that were linked with each experiment.

Table 3 Performance results for all of 20 distinct tooth categories

According to the findings presented in Table 3, it is evident that the highest level of accuracy is attained by employing the pre-trained EfficientNetB3 model with CVApool, resulting in a reported score of 86.36%. The EfficientNetB2 model earned the greatest f-score, with a value of 0.8327. As seen in Table 2, there was an augmentation in the quantity of classes during the training process of a convolutional neural network (CNN) model. Notably, our analysis of reveals that the inclusion of the CVApool layer resulted in enhanced performance across nearly all convolutional neural network (CNN) models, particularly when the number of classes in dental disease classification was raised to twenty disease types. Furthermore, it may be inferred that complex CNN models often encounter overfitting when dealing with large datasets. Conversely, the model with the fewest parameters, especially EfficientNetB2, demonstrated the highest recognition performance.

In the third experiment conducted on a class-wise evaluation for each dental clinic field as shown in Table 4, the number of classes per each category is reported with the following statistics: Endodontics (6 classes), MaxilloFacialSurgery (2 classes), Periodontics (3 classes), Prosthodontics (4 classes), RestorativeDentistry (3 classes). The accuracy of test samples is given in Table 4. When comparing the results of CNN models with CVApool and AVGpool layers in identifying class-wise diseases, it is possible to say that the proposed CVA-based pooling method significantly increases the accuracy rates.

Table 4 Performance results on class-wise evaluation

We can summarize that the CVApool layer produced the following classification scores: Endodontics (EfficientNetB2, 89.41%), MaxilloFacialSurgery (EfficientNetB3, 95.45%), Periodontics (EfficientNetB2, 89.41%), Prosthodontics (EfficientNetB2, 100%), RestorativeDentistry (EfficientNetB3, 100%). It is possible to assert that the performance is declined as the number of classes increased. The generalization capacity of convolutional neural networks is a viable explanation for this phenomenon. The results suggest that replacing the AVGpool layer with CVApool can enhance a typical image-based scenario's classification and prediction scores.

The confusion matrix obtained from utilizing the Efficient-B2 architecture with AVGpool and CVApool was employed to classify various dental disorders. Figure 3 presents the data to facilitate the visualization and analysis of the lost samples within each group. The model exhibits suboptimal performance in Endodontics, distinguishing between abscessed teeth and cysts. In the MaxilloFacialSurgery category, there exists a degree of equilibrium concerning the precision of diagnosing impacted wisdom teeth and retained root conditions. In the field of Periodontics, it was observed that there were four instances of misclassification within the categories of calculus and periodontal bone loss. Furthermore, it is worth noting that there are no instances of misclassified samples within the field of Prosthodontics. Once again, it is essential to consider the balance of accuracy within the field of the RestorativeDentistry.

Fig. 3
figure 3

Confusion matrices for categories using CVApool with pre-trained EfficientNetB2

As a further evaluation of CVApool, a series of experiments were conducted employing various activation functions on the Caltech-101 dataset. The dataset is partitioned into training and validation sets, with 80% allocated for training and 20% allocated for validation. The accuracy rate achieved by the CVA in the absence of an activation function was 95.3390%. The CVApool algorithm achieves an accuracy of 95.9880% when using the tanh activation function and an accuracy of 96.401% when using the sigmoid activation function. Also, CVApool is applied with different activation functions on the dental dataset, resulting in an accuracy of 96.401% with sigmoid activation. The data suggest that the CVApool exhibited the most favorable outcomes concerning positive weights of CNN’s features.

The experiments are conducted using the Google Colab platform. The Tesla T4 GPU is utilized for experimental purposes. The Tesla T4 is a Graphics Processing Unit (GPU) and was founded upon the principles of the Turing architecture. It is specifically designed to enhance the acceleration of deep learning model inference. The Google Tesla T4 card is equipped with a total of 40 streaming multiprocessors (SMs), each of which shares a 6 MB L2 cache. Additionally, the device is equipped with a 16 GB high-bandwidth memory module that is directly linked to the processor.

4.3 Discussions

Dental diseases play a crucial role in determining an individual’s overall health, since problems related to teeth can lead to a range of health difficulties and a decrease in quality of life. Therefore, oral health plays a crucial role in assessing one's overall health, encompassing both general states of well-being and the ability to consume a wide range of food alternatives. The identification and treatment of dental conditions can bring a decrease in the probability of acquiring cardiovascular disease, diabetes and other ailments associated with oral bacteria. However, the process of identification through expert evaluation is a time-consuming and perhaps costly endeavor. To tackle such outlined issues, the application of artificial intelligence techniques may be utilized to efficiently identify dental diseases, hence facilitating accurate treatment outcomes (Fig. 4).

Fig. 4
figure 4

Performance of CVApool on various activation functions

In order to uncover the role of the activation functions in conjunction with CVApool, an ablation research is conducted using the CNN algorithm. The EfficientNetB2, which has been pre-trained, is evaluated using conventional data augmentation techniques. The potential capability of Relu6, Swish, Mish, Softplus, SELU and Hardswish has been examined on the Endodontic dataset. The experiment involves altering the activation functions and supplying beneficial qualities to CVApool. The Swish model demonstrated an admirable degree of accuracy in identifying teeth disease type when integrated with our suggested CVApool layer.

Convolutional neural network (CNN) designs offer a notable benefit compared to earlier methodologies due to their remarkable reusability. Table 5 lists several deep learning studies on dental abnormalities classification. The variations in the accuracy rate may be attributed to the complexity of the models, their capacity for generalization, the presence of a dropout layer and the pooling layer. It is noteworthy to mention that dropout layer is usually incorporated as a means to mitigate overfitting. In contrast, the pooling layer serves to compress the features present in a feature map produced by a convolutional layer.

Table 5 Performance comparison with some methods for intraoral X-ray images

The present work examines the viability of integrating CVApool with pre-trained CNN models for the purpose of assessing the disease severity of dental structures. For a baseline comparison, we compare obtained results with traditional CNN models (VGG16, Alexnet) to prove that CVApool is useful when used as alternative solution for average pooling. It is interesting to compare our results with the accuracy obtained for teeth disease recognition from similar CNN architecture, EfficientNetB0.

Table 5 clearly shows that most studies have shown the benefits of using pre-trained CNN models. Research on the teeth disease identification has been mostly restricted to limited comparisons of simulating the transfer learning methodology on some sort of CNN types, such as AlexNet [37] and VGG16 [36]. Some studies present an effort for the use of custom CNN models, such as CustomAlexNet [17], a fully convolutional network (FCN) [23] and a hybrid neural network (HNN) [8]. The accuracy rate of our CVApool layer combined with EfficientNetB3, EfficientNetB2, EfficientNetB2, EfficientNetV2S was 86.4%, 83.8%, 89.4%, 100% at seven categories, twenty categories, Endodontics and RestorativeDentistry datasets, respectively. In contrast with findings of VGG16, however, there is a weak performance in AlexNet method, when analyzing the simulation outcomes. One notable limitation of previous studies is that they have neglected to address the issue of huge datasets, namely those with twenty classes. The strength of EfficientNet models in smaller class sizes, along with the utilization of CVApool to obtain meaningful features, shown reduced sensitivity to inter-class similarities compared to other forms of CNNs.

The main contribution of this paper is as follows:

  1. 1.

    To develop a new and extensive database of dental disorders, with the purpose of addressing the influence of convolutional neural networks on a wide range of categories, such as Endodontics (6 classes), MaxilloFacialSurgery (2 classes), Periodontics (3 classes), Prosthodontics (4 classes) and RestorativeDentistry (3 classes).

  2. 2.

    Rather than posing a straightforward CNN model, an effective solution is proposed that can aid in the detection of teeth abnormalities in the acquired X-ray image. To this end, this study comprised a combination of functional pooling layer and transfer learning models to facilitate the achievement of a noteworthy accuracy rate beyond that of typical average pooling approaches.

  3. 3.

    The CVApool, when paired with the notion of fine-tuned classification, may extract more discriminant features compared to the original EfficientNet model by reducing the impact of outliers in a typical batch of data.

5 Conclusion

Applying deep learning methodologies to include intraoral X-ray images with dental characteristics improves the accuracy of decision-making about dental health. This paper presents the implementation of a new pooling layer for detecting various dental conditions using X-ray images. The dental conditions considered in this study include abscessed teeth, calculus, caries, cysts, dental bridges, dental crowns, extracted teeth, filling overhang, impacted wisdom teeth, implants, mesialized dentition, mixed dentition, periodontal bone loss, pulpitis, restoration by filling, retained root, screw-retained restoration, single-root canal treatment, two-root canal treatment and three-root canal treatment. In order to improve the efficiency of the convolutional neural network (CNN) model, an alternative pooling method known as CVApool was developed and shown a significant level of success. This strategy combines the Gram–Schmidt Orthogonalization technique with the Common Vector Approach (CVA). Additional research is required to develop a rapid and efficient shared vector approach by utilizing L1 decomposition or the efficient decomposition for obtaining eigenvectors.