CVApool: using null-space of CNN weights for the tooth disease classification

Can, Zuhal; Isik, Sahin; Anagun, Yildiray

doi:10.1007/s00521-024-09995-2

CVApool: using null-space of CNN weights for the tooth disease classification

Original Article
Open access
Published: 29 May 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

CVApool: using null-space of CNN weights for the tooth disease classification

Download PDF

325 Accesses
Explore all metrics

Abstract

In light of current developments in dental care, dental professionals have increasingly used deep learning methods to get precise diagnoses of oral problems. Using intraoral X-rays in dental radiography is imperative in many dental interventions. Integrating deep learning techniques with a unique collection of intraoral X-ray images has been undertaken to enhance the accuracy of dental disease detection. In this study, we propose an alternative pooling layer, namely the Common Vector Approach Pooling technique, to address the constraints associated with average pooling in deep learning methods. The experiments are conducted on a large dataset, involving twenty different dental conditions, divided into seven categories. Our proposed approach achieved a high accuracy rate of 86.4% in identifying dental problems across the seven oral categories.

Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda

Article 13 January 2022

Machine learning and deep learning approach for medical image analysis: diagnosis to detection

Article 24 December 2022

Medical image data augmentation: techniques, comparisons and interpretations

Article 20 March 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Periodontal infections and dental caries are prevalent, non-transmissible conditions that are prior reasons for tooth loss. Typically, dentists are responsible for appropriately managing these concerns, while more intricate instances are referred to specialized practitioners in other dental disciplines, including endodontics, oral maxillofacial surgery, periodontics, prosthodontics, orthodontics and restorative dentistry. The timely identification and management of dental conditions can yield improved treatment results, including halting disease development, promoting overall well-being and societal functioning, and reducing the risk of tooth extraction. Deep learning-based studies are playing a crucial role in the field of teeth categorization research, therefore assisting dentists in gaining a more accurate understanding of disease types, enhancing diagnostic capabilities, and facilitating the development of more effective treatment methods.

Intraoral X-rays [1] in digital dental radiography [2] are distinguished from panoramic X-rays [3], cephalometric X-rays [4] and cone beam computed tomography (CBCT) [5] regarding the prevalence of use for various dental treatment phases in oral healthcare. Periapical and bitewing images [6] are intraoral X-rays that provide information on bone and surrounding oral tissue containing three to four teeth in a single X-ray image.

For improving dental disease diagnosis and treatment plans, deep learning [7] has significantly influenced the processing of intraoral X-ray images in image processing [8], segmentation [9,10,11,12] and enhancements [13,14,15]. These developments integrating intraoral X-ray imaging with deep learning techniques enhance the precision of oral health condition recognition and detection, such as dental caries [16,17,18,19,20,21,22], implant [23], oriented tooth [24], restoration by filling [25] and dental material [26]. This study adapts deep learning techniques on a novel dataset to recognize and detect twenty categories as abscessed teeth, calculus, caries, cysts, dental bridges, dental crowns, extracted teeth, filling overhang, impacted wisdom teeth, implants, mesialized dentition, mixed dentition, periodontal bone loss, pulpitis, restoration by filling, retained root, screw-retained restoration, single-root canal treatment, two-root canal treatment and three-root canal treatment.

Common Vector Approach (CVA) is a pooling technique to enhance the accuracy of the deep learning models and to overcome limitations associated with average pooling in deep learning [27,28,29,30,31,32]. The CVA concept is derived from Principal Component Analysis (PCA), which projects onto eigenvectors with the smallest eigenvalues for extracting more informative and discriminant features. Using CVApool, the outlier features in a batch of data could be eliminated by utilizing the acquired difference and common vector for that data batch for providing a meaningful representation of feature vectors.

This study adapts deep learning techniques on a novel dataset to recognize and detect twenty classes as abscessed teeth, calculus, caries, cysts, dental bridges, dental crowns, extracted teeth, filling overhang, impacted wisdom teeth, implants, mixed dentition, periodontal bone loss, pulpitis, restoration by filling, retained root, screw-retained restoration, single-root canal treatment, two-root canal treatment and three-root canal treatment. In this study, a deep convolutional neural network (CNN) model is developed using CVApool to enhance the accuracy of diagnosing dental diseases using intraoral X-ray. Initially, CNN was trained to automatically assess the healthy state of X-ray images, categorizing them based on a blind test set of teeth samples. Subsequently, the CVApool layer was developed, to reduce dimensionality by coupling the Gram–Schmidt Orthogonalization technique with the CVA method. The CVApool was applied to each data batch (matrix), where each column vector corresponds to a unique feature vector. Finally, several experiments conducted using various deep learning models like EfficientNetB1, B2, B3 for evaluating the performance of the CVApool.

The sections of this paper are organized as follows: Sect. 2 explains the related studies. Section 3 provides an introduction to the CVApool technique and outlines the steps taken to implement the approach on a CNN model. Section 4 presents a comprehensive summary of the proposed method’s performance across various datasets based on objective evaluations. The concluding part of this article discusses the implications of the findings.

2 Related work

After the digitalization of dental imaging, researchers have been interested in developing AI algorithms for dental condition classification based on intraoral X-ray images. One of the studies is the development of a new Dental Diagnosis System based on feature extraction, segmentation, classification and decision-making processes for the classification of dental conditions, including root fracture, teeth, decay, missing teeth and resorption of periodontal bone with an accuracy of 92.74% [33]. In another study, geometric features of dental images are extracted to detect dental caries with a detection precision of 97% [34]. Geetha et al. developed a diagnostic system using statistical feature extraction and a back-propagation neural network for detecting dental caries in digital radiographs and achieved an accuracy of 97.1% [35].

In recent studies, researchers have focused on validating the efficiency and accuracy of deep learning algorithms for analyzing intraoral X-ray images. After the development of CNNs, deep learning architectures for dental imaging evolved as the variations in CNNs, such as VGGNet [36], GoogLeNet [21], AlexNet [37], EfficientNet [38] and DenseNet [39]. For example, Lee et al. used a pretrained GoogLeNet Inceptionv3 CNN network with transfer learning and achieved high diagnostic accuracies for detecting dental caries in radiographic images [21]. Vasdev et al. compared Res-Net18, Res-Net34 and AlexNet for classifying caries, root canal treatment, abscess, bone loss and missing teeth over 16,000 dental images with an accuracy of up to 85.2%. Chen et al. combined EfficientNetB0 and fully connected layers for periodontitis and dental caries recognition and achieved a precision of 97.1%.

Deep learning models automatically extract relevant features and classify dental images based on the labels of X-ray images. Several studies classify dental abnormalities based on deep learning models. Imak et al. developed a multi-input deep convolutional neural network ensemble model using machine learning and image processing techniques and achieved an accuracy score of 99.13% for diagnosing dental caries [17]. Zhu et al. developed a diagnosis method to predict and localize caries lesions on periapical X-ray images based on artificial intelligence and the Faster R-CNN method and achieved a precision of 73.49% [22]. Kohlakala et al. trained a fully convolutional network (FCN) on the generated X-ray images to identify dental implant connections and achieved a classification accuracy of 71.7% [23]. Leo and Reddy categorized caries lesions based on their type and severity by combining Artificial Neural Network (ANN) and Deep Neural Network (DNN) techniques under the name of Hybrid Neural Network (HNN) and achieved an accuracy of 96% on the classification of caries levels [8]. Aparna et al. text estimated the level of filling in teeth on X-ray images using Mask R-CNN with ResNet50 architecture [25]. Singh and Sehgal developed a deep convolution layer network (CNN) with a Long Short-Term Memory (LSTM) model for detection and diagnosis of dental caries.

Convolutional neural networks (CNNs) are a type of artificial neural network that has been specifically developed to handle signals, sequences, images or volumetric data. CNNs have demonstrated their efficacy in many tasks, including but not limited to fault analysis [40, 41], image segmentation and image recognition. In traditional machine learning methodologies, the process of feature extraction or engineering is necessary in order to get precise predictions. Nevertheless, the success of feature-driven algorithms is dependent upon the characteristics of the data and is also linked to substantial computing expenses when handling huge datasets. Deep learning models have the ability to integrate raw data into several layers of intermediate features throughout the training process. The outcome is not dependent upon human interaction. Meanwhile, the generalization capacity of traditional methods is restricted, but deep learning models exhibit unprecedented generalization performance. Despite the need for a substantial quantity of unlabeled data, high-performance hardware and training time, CNN-based algorithms exhibit superior cognitive processing capabilities and outperform commonly used machine learning methods. By considering such facts, we employed a new CNN pooling layer for teeth disease classification.

3 Proposed pooling layer

3.1 Definition and contributions of CVA

The subspace-based recognition approach known as CVA has demonstrated favorable outcomes across various applications and classification challenges. The Common Vector Approach (CVA) concept is derived from the underlying principles of Principal Component Analysis (PCA). Principal Component Analysis (PCA) involves projecting the data onto the eigenvector that corresponds to the largest eigenvalues. Conversely, the CVA technique reverses this process, projecting the data onto the eigenvector associated with the fewest eigenvalues. It has been shown that the eigenvectors associated with the least (or zero) eigenvalues are the most representative of the class that exhibit the shared properties of vectors without any variances. There is a strong relationship between common vector and the indifference subspace (null-space) and the zero principal components. So, one can say that the common vector is a special vector that describes the characteristics that are shared by all instances of a class.

The definition of the common vector is not complex in terms of implementation. Suppose that we are given m samples corresponding to a processed class, $\left\{ {{\mathbf{a}}_{{\text{i}}} } \right\}{\text{i}} = 1,2, \ldots ,{\text{m}}$). It is now possible to denote each ${\mathbf{a}}_{{\mathbf{i}}}$ vector as the sum of ${\mathbf{a}}_{{\text{i}}} = {\mathbf{a}}_{{{\text{com}}}} + {\mathbf{a}}_{{\text{i,diff}}}$. To put it another way, a common vector, or ${\mathbf{a}}_{{{\mathbf{com}}}}$, is what is left after the difference vectors of class members are subtracted, and it is constant across the whole class. On the other hand, the ${\mathbf{a}}_{{{\mathbf{i}},{\mathbf{diff}}}}$ vector is referred to as the remaining vector, and it is the one that indicates the unique residual trend of this specific sample.

The vector encompassing the attributes that remain unchanged is called the common vector. The CVA method provides an effective means of dimensionality reduction. Depending on the vector size (n) and the number of samples (m) in each class, two scenarios might arise during the extraction of the common vector. If n > m, we get the so-called sufficient case, where the number of observations is larger than the number of features. The second is the insufficient case (n ≤ m). The derivation of a solution for a so-called insufficient data situation is the key benefit of CVA compared to other subspace-based classification techniques. When the output of a standard CNN technique is taken into consideration, it is possible to observe that the length of a vector is greater than the number of vectors in a class. As a result, this case is an excellent illustration of the insufficient data situation since it indicates that the length of a vector is greater than the number of vectors in CNN output.

In mathematical terms, the common vector may be determined by subtracting the average vector from the sum of the projection of the average vector onto the orthonormal basis. This is another way to get the common vector. We may thus state that the common vector differs from the mean vector. A numerical demonstration is provided, and the difference in performance is compared with a straightforward example, in an effort to clarify how the contribution of the common vector compares to that of the average vector. Since the output of CNN is dealt with in the form of a vector format, the insufficient data situation has been the primary focus of our investigation.

Let’s assume that we are given m samples. The a_i shows one of training vector of batch data. The a_com indicates the common vector of processed CNN output for single batch. Finally, z_i is orthonormal basis returned from Gram–Schmidt Orthogonalization. Thus, the following expressions apply to all training set vectors:

$$\begin{aligned} {\mathbf{a}}_{1} = {\mathbf{a}}_{{{\mathbf{com}}}} + \left\langle {{\mathbf{a}}_{1} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1} &+ \left\langle {{\mathbf{a}}_{1} ,{\mathbf{z}}_{2} } \right\rangle {\mathbf{z}}_{2} + \ldots + \left\langle {{\mathbf{a}}_{1} ,{\mathbf{z}}_{{{\mathbf{m}} - 1}} } \right\rangle {\mathbf{z}}_{{{\mathbf{m}} - 1}} \\ {\mathbf{a}}_{2} = {\mathbf{a}}_{{{\mathbf{com}}}} + \left\langle {{\mathbf{a}}_{2} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1}& + \left\langle {{\mathbf{a}}_{2} ,{\mathbf{z}}_{2} } \right\rangle {\mathbf{z}}_{2} + \ldots + \left\langle {{\mathbf{a}}_{2} ,{\mathbf{z}}_{{{\mathbf{m}} - 1}} } \right\rangle {\mathbf{z}}_{{{\mathbf{m}} - 1}} \\ & \qquad \qquad \qquad \ldots \\ {\mathbf{a}}_{{\mathbf{m}}} = {\mathbf{a}}_{{{\mathbf{com}}}} + \left\langle {{\mathbf{a}}_{{\mathbf{m}}} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1} &+ \left\langle {{\mathbf{a}}_{{\mathbf{m}}} ,{\mathbf{z}}_{2} } \right\rangle {\mathbf{z}}_{2} + \ldots + \left\langle {{\mathbf{a}}_{{\mathbf{m}}} ,{\mathbf{z}}_{{{\mathbf{m}} - 1}} } \right\rangle {\mathbf{z}}_{{{\mathbf{m}} - 1}} \\ \end{aligned}$$

(1)

By summing up each side of the preceding equation simultaneously, we get:

$$\begin{aligned} & \sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} } = {\mathbf{m}}\;{\mathbf{a}}_{{{\mathbf{com}}}} + \left\langle {\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} {\mathbf{z}}_{1} } } \right\rangle {\mathbf{z}}_{1} + \left\langle {\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} {\mathbf{z}}_{2} } } \right\rangle {\text{z}}_{2} + \ldots + \left\langle {\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} {\mathbf{z}}_{{{\mathbf{m}} - 1}} } } \right\rangle {\text{z}}_{{{\mathbf{m}} - 1}} \\ & \frac{1}{{\mathbf{m}}}\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} } = {\mathbf{a}}_{{{\mathbf{com}}}} + \left\langle {\frac{1}{{\mathbf{m}}}\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} {\mathbf{z}}_{1} } } \right\rangle {\mathbf{z}}_{1} + \left\langle {\frac{1}{{\mathbf{m}}}\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} {\mathbf{z}}_{2} } } \right\rangle {\mathbf{z}}_{2} + \ldots + \left\langle {\frac{1}{{\mathbf{m}}}\sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} {\mathbf{z}}_{{{\mathbf{m}} - 1}} } } \right\rangle {\mathbf{z}}_{{{\mathbf{m}} - 1}} \quad {\text{and}} \\ & {\mathbf{a}}_{{{\mathbf{com}}}} = {\mathbf{a}}_{{{\mathbf{ave}}}} - \left\langle {{\mathbf{a}}_{{{\mathbf{ave}}}} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1} - \left\langle {{\mathbf{a}}_{{{\mathbf{ave}}}} ,{\mathbf{z}}_{2} } \right\rangle {\mathbf{z}}_{2} - \ldots - \left\langle {{\mathbf{a}}_{{{\mathbf{ave}}}} ,{\mathbf{z}}_{{{\mathbf{m}} - 1}} } \right\rangle {\mathbf{z}}_{{{\mathbf{m}} - 1}} \\ \end{aligned}$$

(2)

The common vector may be found by subtracting the average vector from its projection onto the complete orthonormal basis, as we can see above. We provide three very basic training vectors in order to analyze the behaviors of an average vector and common vector based on a toy example. Let’s suppose that the basic three vectors are a₁ = [1 1 1]^T, a₂ = [1 1 -1]^T and a₃ = [1 5 5]^T. These vectors can be normalized by using sigmoid activation function. Then, the vectors become a₁ = [0.7311 0.7311 0.7311]^T, a₂ = [0.7311 0.7311 0.2689]^T and a₃ = [0.7311 0.9933 0.9933]^T. The common vector of these three vectors can be obtained by constructing difference subspace (b₁ and b₂) by setting a₁ as reference vector:

$${\mathbf{b}}_{{\mathbf{1}}} = \left[ {0\;0\; - 0.4621} \right]^{{\text{T}}} \quad {\text{and}}\quad {\mathbf{b}}_{{\mathbf{2}}} = \left[ {0\;0.2622\;0.2622} \right]^{{\text{T}}}$$

(3)

After that, we will continue on to the Gram–Schmidt Orthogonalization, and the following listed rules that will be used to obtain the common vector for the training set:

$${\mathbf{d}}_{1} = {\mathbf{b}}_{1} ,\quad {\mathbf{z}}_{1} = \frac{{{\mathbf{b}}_{1} }}{{\left\| {{\mathbf{b}}_{1} } \right\|}} = \left[ {0\;0\;1} \right]^{{\text{T}}}$$

(4)

$${\mathbf{d}}_{2} = {\mathbf{b}}_{2} - \left\langle {{\mathbf{b}}_{2} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1} = \left[ {0\;0.2622\;0} \right]^{{\text{T}}} ,\quad {\mathbf{z}}_{2} = \frac{{{\mathbf{d}}_{2} }}{{\left\| {{\mathbf{d}}_{2} } \right\|}} = \left[ {0\;1\;0} \right]^{{\text{T}}}$$

(5)

(z₁, z₂, …, z_(k-1)) refers the orthonormal basis, and (b₁, b₂, …, b_(k-1)) denotes the orthogonal vectors after normalization. As a result, the summation of the projections of a₁ onto the orthonormal basis of the difference subspace B, also known as a_sum, may be computed in the following manner:

$${\mathbf{a}}_{{{\mathbf{sum}}}} = \left\langle {{\mathbf{a}}_{1} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1} + \left\langle {{\mathbf{a}}_{1} ,{\mathbf{z}}_{2} } \right\rangle {\mathbf{z}}_{2} = \left[ {0\;0.7311\;0.7311} \right]^{{\text{T}}}$$

(6)

In conclusion, we may get the common vector by first subtracting a_sum from either the reference vector or the average vector. By utilizing the reference vector (a₁), it is possible to determine the common vector, as shown in Eq. (7).

$${\mathbf{a}}_{{{\mathbf{sum}}}} = {\mathbf{a}}_{{\mathbf{1}}} - {\mathbf{a}}_{{{\mathbf{sum}}}} = \left[ {0.7311\quad 0\quad 0} \right]^{{\text{T}}}$$

(7)

When working with the same list of vectors, one can acquire the common vector by first calculating the average vector. Let’s start by defining what the average vector as follows:

$${\mathbf{a}}_{{{\mathbf{ave}}}} = \sum\limits_{{{\mathbf{i}} = 1}}^{{\mathbf{m}}} {{\mathbf{a}}_{{\mathbf{i}}} } = \left[ {0.7311\quad 0.8185\quad 0.6644} \right]^{{\text{T}}}$$

(8)

Then, we will project the a_ave onto the orthonormal basis:

$${\mathbf{a}}_{{{\mathbf{ave}},{\mathbf{sum}}}} = \left\langle {{\mathbf{a}}_{{{\mathbf{ave}}}} ,{\mathbf{z}}_{1} } \right\rangle {\mathbf{z}}_{1} + \left\langle {{\mathbf{a}}_{{{\mathbf{ave}}}} ,{\mathbf{z}}_{2} } \right\rangle {\mathbf{z}}_{2} + \ldots + \left\langle {{\mathbf{a}}_{{{\mathbf{ave}}}} ,{\mathbf{z}}_{{{\mathbf{m}} - 1}} } \right\rangle {\mathbf{z}}_{{{\mathbf{m}} - 1}} = \left[ {0\quad 0.8185\quad 0.6644} \right]^{{\text{T}}}$$

(9)

As a result, the following would be the common vector:

$${\mathbf{a}}_{{{\mathbf{com}}}} = {\mathbf{a}}_{{{\mathbf{ave}}}} - {\mathbf{a}}_{{{\mathbf{ave}},{\mathbf{sum}}}} = \left[ {0.7311\quad 0\quad 0} \right]^{{\text{T}}} ,$$

(10)

It is exactly the same vector that is discovered by Eq. 7. Nevertheless, if we compute the magnitudes of the differences that exist between the test sample, the average vector and the common vector, we will see that there is significantly different margin for classification:

$$Dist_{1}^{ave} = \left\| {{\mathbf{test}} - {\mathbf{a}}_{{{\mathbf{ave}}}} } \right\|^{2} = 0.1862,\quad Dist_{1}^{com} = \left\| {{\text{test}} - {\mathbf{a}}_{{{\mathbf{com}}}} } \right\|^{2} = 0.8857$$

(11)

These contrasting distance results also suggest that the common vector and the average vector perform radically differently in terms of their ability to classify data. The generated Dist values are used as a reference point in order to compare and contrast the capabilities of an average vector and a common vector. We would like to hope that the achieved distance will be as great as it is feasible to be in order to get the highest potential performance. It is very clear that $Dist_{1}^{com}$ is a substantially produced bigger margin, whereas $Dist_{1}^{ave}$ is really little. When contrasted with an experiment of the average vector, the toy example demonstrates that the CVA has the potential to enhance the margin between test samples, common of class vectors.

3.2 Using CVA as pooling layer

The CVA approach is utilized as a pooling process to provide a meaningful and uniform representation of the feature vectors. To provide further clarification, the common vector of a column representation of the k feature vectors can be obtained in one of two ways: first, by employing an eigenspace model that is made up of eigenvectors that correspond to the smallest eigenvalues, and second, by utilizing the Gram–Schmid orthogonalization procedure in order to obtain the orthonormal vectors of the processed data if there are insufficient data. We have focused on the Gram–Schmid orthogonalization process rather than generating the orthonormal vectors since finding the eigenvector for a big data dimension demands a significant amount of computer memory.

The PCA approach suffers from a number of drawbacks, the most notable of which are the high computing cost for eigenvector decompositions, the need to standardize the data and the loss of information. We have developed a CVA approach to create an accurate distance map between the test samples and the class samples by considering the negative impacts caused by using pure PCA. During the classification process, it was discovered that the Categorical Cross Entropy Loss offered superior results for running the CVApool with CNN models. Following the rules will supply additional information to show implementation of CVA for a single batch data.

The general stages involved in putting the CVApool concept into action may be summed up as follows. In an example, the CNN produces a batch data with size of b × k × n × n. Let’s consider batch data are (64 × 1280 × 7 × 7). We can represent each batch data by reshaping as (1280 × 49), and we try to find the common vector for each of batch, which would be return as (1 × 1280), since there are 49 vectors. Eventually, we will obtain (64 × 1280) data since the size of batch is 64. The common vector related to the batch data a_b,com can be achieved through the following steps.

To begin, the difference vector a_b,diff is derived by first projecting the reference vector (a_t) onto the orthonormal basis (z₁, z₂, …, z_(k-1)), which is accomplished by the Gram–Schmidt process.
$${\mathbf{a}}_{{{\mathbf{b}},{\mathbf{diff}}}} = \left\langle {{\mathbf{a}}_{{\mathbf{t}}} ,{\mathbf{z}}_{1} } \right\rangle + \left\langle {{\mathbf{a}}_{{\mathbf{t}}} ,{\mathbf{z}}_{2} } \right\rangle + \ldots + \left\langle {{\mathbf{a}}_{{\mathbf{t}}} ,{\mathbf{z}}_{{{\text{k}} - 1}} } \right\rangle$$
(12)
Following this, a common vector of batch data may be obtained by subtracting the a_b,diff from the reference vector (a_t), as seen in Eq. (13).
$${\mathbf{a}}_{{{\mathbf{b}},{\mathbf{com}}}} = {\mathbf{a}}_{{\mathbf{t}}} - {\mathbf{a}}_{{{\mathbf{b}}{,}{\mathbf{diff}}}}$$
(13)

Figure 1 indicates the overall system for the tooth disease categorization. Experiments were carried out using various deep learning models, including EfficientNetB1 [42], EfficientNetB2, EfficientNetB3 and EfficientNetV2S. The number 32 is used to specify the batch size. Optimizer is specified as Adam [43]. A value of 0.001 is used as the starting point for the learning rate, and a coefficient of 0.3 is used to regulate it. The epoch is 60 and 100 for the pre-trained and non-pre-trained models, respectively. We have used strategies such as random resized crops and flip augmentations to increase the performance of our decisions.

4 Experimental study

4.1 Dataset

Dental X-ray images are collected from 1887 patients from a dental clinic in Ankara, Turkey, between years 2006 and 2023 using a dental imaging software. 2971 images are included into dataset, and these images are divided into 6 categories and 20 classes as summarized in Table 1. Also, Fig. 2 presents samples of considered teeth types.

Table 1 Experimental dataset

Full size table

4.2 Performance evaluation on tooth dataset

Dentists are able to propose a range of treatment options for the restoration of damaged teeth through the utilization of accurate decision-making facilitated by artificial intelligence. These treatment options aid in restoring the visually appealing, anatomical and physiological functions of teeth. For this reason, we have conducted various tests using small and big dental datasets.

The initial experiment segmented the entire dataset into seven distinct dental clinical departments. Dental professionals devised the terminology for many departments within the field of dentistry. These departments include orthodontics, endodontics, maxillofacial surgery, prosthodontics, periodontics and restorative dentistry. An investigation has been conducted on the category-wise accuracy and f-measure. Furthermore, the confusion matrices pertaining to each category are presented. The second experiment focused on augmenting the overall quantity of classes to assess the models' ability to replicate the data properly. Consequently, the model's performance was assessed on a comprehensive set of twenty dental problems, encompassing the entirety of diseases encountered in seven dental clinics.

Furthermore, the comparative analysis of the experimental results was conducted by utilizing both pre-trained and untrained convolutional neural network (CNN) models. The evaluation of the CNN models' performance under test settings involved the utilization of objective measures, namely accuracy and f1-score, to assess their correctness. Table 2 displays the discrimination accuracy for seven categories by disregarding the subcategories inside the classes individually.

Table 2 Performance results for seven categories

Full size table

Also, Table 3 presents the performance scores on 20 various forms of dental illness. These categories are orthodontics, endodontics, maxillofacial surgery, prosthodontics, periodontics, restorative dentistry and other dental professions. Other dental fields include prosthodontics and periodontics. The estimation results of the deep learning architectures EfficientNetB1, EfficientNetB2, EfficientNetB3 and EfficientNetBV2S are presented in Tables 2 and 3, respectively. The presented tables depict the results obtained from the examination of performance metrics. For each pooling layer, we conducted an analysis of the compromises and commutations that were linked with each experiment.

Table 3 Performance results for all of 20 distinct tooth categories

Full size table

According to the findings presented in Table 3, it is evident that the highest level of accuracy is attained by employing the pre-trained EfficientNetB3 model with CVApool, resulting in a reported score of 86.36%. The EfficientNetB2 model earned the greatest f-score, with a value of 0.8327. As seen in Table 2, there was an augmentation in the quantity of classes during the training process of a convolutional neural network (CNN) model. Notably, our analysis of reveals that the inclusion of the CVApool layer resulted in enhanced performance across nearly all convolutional neural network (CNN) models, particularly when the number of classes in dental disease classification was raised to twenty disease types. Furthermore, it may be inferred that complex CNN models often encounter overfitting when dealing with large datasets. Conversely, the model with the fewest parameters, especially EfficientNetB2, demonstrated the highest recognition performance.

In the third experiment conducted on a class-wise evaluation for each dental clinic field as shown in Table 4, the number of classes per each category is reported with the following statistics: Endodontics (6 classes), MaxilloFacialSurgery (2 classes), Periodontics (3 classes), Prosthodontics (4 classes), RestorativeDentistry (3 classes). The accuracy of test samples is given in Table 4. When comparing the results of CNN models with CVApool and AVGpool layers in identifying class-wise diseases, it is possible to say that the proposed CVA-based pooling method significantly increases the accuracy rates.

Table 4 Performance results on class-wise evaluation

Full size table

We can summarize that the CVApool layer produced the following classification scores: Endodontics (EfficientNetB2, 89.41%), MaxilloFacialSurgery (EfficientNetB3, 95.45%), Periodontics (EfficientNetB2, 89.41%), Prosthodontics (EfficientNetB2, 100%), RestorativeDentistry (EfficientNetB3, 100%). It is possible to assert that the performance is declined as the number of classes increased. The generalization capacity of convolutional neural networks is a viable explanation for this phenomenon. The results suggest that replacing the AVGpool layer with CVApool can enhance a typical image-based scenario's classification and prediction scores.

The confusion matrix obtained from utilizing the Efficient-B2 architecture with AVGpool and CVApool was employed to classify various dental disorders. Figure 3 presents the data to facilitate the visualization and analysis of the lost samples within each group. The model exhibits suboptimal performance in Endodontics, distinguishing between abscessed teeth and cysts. In the MaxilloFacialSurgery category, there exists a degree of equilibrium concerning the precision of diagnosing impacted wisdom teeth and retained root conditions. In the field of Periodontics, it was observed that there were four instances of misclassification within the categories of calculus and periodontal bone loss. Furthermore, it is worth noting that there are no instances of misclassified samples within the field of Prosthodontics. Once again, it is essential to consider the balance of accuracy within the field of the RestorativeDentistry.

As a further evaluation of CVApool, a series of experiments were conducted employing various activation functions on the Caltech-101 dataset. The dataset is partitioned into training and validation sets, with 80% allocated for training and 20% allocated for validation. The accuracy rate achieved by the CVA in the absence of an activation function was 95.3390%. The CVApool algorithm achieves an accuracy of 95.9880% when using the tanh activation function and an accuracy of 96.401% when using the sigmoid activation function. Also, CVApool is applied with different activation functions on the dental dataset, resulting in an accuracy of 96.401% with sigmoid activation. The data suggest that the CVApool exhibited the most favorable outcomes concerning positive weights of CNN’s features.

The experiments are conducted using the Google Colab platform. The Tesla T4 GPU is utilized for experimental purposes. The Tesla T4 is a Graphics Processing Unit (GPU) and was founded upon the principles of the Turing architecture. It is specifically designed to enhance the acceleration of deep learning model inference. The Google Tesla T4 card is equipped with a total of 40 streaming multiprocessors (SMs), each of which shares a 6 MB L2 cache. Additionally, the device is equipped with a 16 GB high-bandwidth memory module that is directly linked to the processor.

4.3 Discussions

Dental diseases play a crucial role in determining an individual’s overall health, since problems related to teeth can lead to a range of health difficulties and a decrease in quality of life. Therefore, oral health plays a crucial role in assessing one's overall health, encompassing both general states of well-being and the ability to consume a wide range of food alternatives. The identification and treatment of dental conditions can bring a decrease in the probability of acquiring cardiovascular disease, diabetes and other ailments associated with oral bacteria. However, the process of identification through expert evaluation is a time-consuming and perhaps costly endeavor. To tackle such outlined issues, the application of artificial intelligence techniques may be utilized to efficiently identify dental diseases, hence facilitating accurate treatment outcomes (Fig. 4).

In order to uncover the role of the activation functions in conjunction with CVApool, an ablation research is conducted using the CNN algorithm. The EfficientNetB2, which has been pre-trained, is evaluated using conventional data augmentation techniques. The potential capability of Relu6, Swish, Mish, Softplus, SELU and Hardswish has been examined on the Endodontic dataset. The experiment involves altering the activation functions and supplying beneficial qualities to CVApool. The Swish model demonstrated an admirable degree of accuracy in identifying teeth disease type when integrated with our suggested CVApool layer.

Convolutional neural network (CNN) designs offer a notable benefit compared to earlier methodologies due to their remarkable reusability. Table 5 lists several deep learning studies on dental abnormalities classification. The variations in the accuracy rate may be attributed to the complexity of the models, their capacity for generalization, the presence of a dropout layer and the pooling layer. It is noteworthy to mention that dropout layer is usually incorporated as a means to mitigate overfitting. In contrast, the pooling layer serves to compress the features present in a feature map produced by a convolutional layer.

Table 5 Performance comparison with some methods for intraoral X-ray images

Full size table

The present work examines the viability of integrating CVApool with pre-trained CNN models for the purpose of assessing the disease severity of dental structures. For a baseline comparison, we compare obtained results with traditional CNN models (VGG16, Alexnet) to prove that CVApool is useful when used as alternative solution for average pooling. It is interesting to compare our results with the accuracy obtained for teeth disease recognition from similar CNN architecture, EfficientNetB0.

Table 5 clearly shows that most studies have shown the benefits of using pre-trained CNN models. Research on the teeth disease identification has been mostly restricted to limited comparisons of simulating the transfer learning methodology on some sort of CNN types, such as AlexNet [37] and VGG16 [36]. Some studies present an effort for the use of custom CNN models, such as CustomAlexNet [17], a fully convolutional network (FCN) [23] and a hybrid neural network (HNN) [8]. The accuracy rate of our CVApool layer combined with EfficientNetB3, EfficientNetB2, EfficientNetB2, EfficientNetV2S was 86.4%, 83.8%, 89.4%, 100% at seven categories, twenty categories, Endodontics and RestorativeDentistry datasets, respectively. In contrast with findings of VGG16, however, there is a weak performance in AlexNet method, when analyzing the simulation outcomes. One notable limitation of previous studies is that they have neglected to address the issue of huge datasets, namely those with twenty classes. The strength of EfficientNet models in smaller class sizes, along with the utilization of CVApool to obtain meaningful features, shown reduced sensitivity to inter-class similarities compared to other forms of CNNs.

The main contribution of this paper is as follows:

1.
To develop a new and extensive database of dental disorders, with the purpose of addressing the influence of convolutional neural networks on a wide range of categories, such as Endodontics (6 classes), MaxilloFacialSurgery (2 classes), Periodontics (3 classes), Prosthodontics (4 classes) and RestorativeDentistry (3 classes).
2.
Rather than posing a straightforward CNN model, an effective solution is proposed that can aid in the detection of teeth abnormalities in the acquired X-ray image. To this end, this study comprised a combination of functional pooling layer and transfer learning models to facilitate the achievement of a noteworthy accuracy rate beyond that of typical average pooling approaches.
3.
The CVApool, when paired with the notion of fine-tuned classification, may extract more discriminant features compared to the original EfficientNet model by reducing the impact of outliers in a typical batch of data.

5 Conclusion

Applying deep learning methodologies to include intraoral X-ray images with dental characteristics improves the accuracy of decision-making about dental health. This paper presents the implementation of a new pooling layer for detecting various dental conditions using X-ray images. The dental conditions considered in this study include abscessed teeth, calculus, caries, cysts, dental bridges, dental crowns, extracted teeth, filling overhang, impacted wisdom teeth, implants, mesialized dentition, mixed dentition, periodontal bone loss, pulpitis, restoration by filling, retained root, screw-retained restoration, single-root canal treatment, two-root canal treatment and three-root canal treatment. In order to improve the efficiency of the convolutional neural network (CNN) model, an alternative pooling method known as CVApool was developed and shown a significant level of success. This strategy combines the Gram–Schmidt Orthogonalization technique with the Common Vector Approach (CVA). Additional research is required to develop a rapid and efficient shared vector approach by utilizing L1 decomposition or the efficient decomposition for obtaining eigenvectors.

Data availability

The datasets generated during and/or analyzed during the current study are not publicly available due to privacy-preserving of patient data. Still, they are available from the corresponding author on reasonable request.

References

Renaud M et al (2023) Intraoral ultrasonography for periodontal tissue exploration: a review. DIAGNOSTICS. https://doi.org/10.3390/diagnostics13030365
Article Google Scholar
Shah N, Bansal N, Logani A (2014) Recent advances in imaging technologies in dentistry. WORLD J Radiol 6(10):794–807. https://doi.org/10.4329/wjr.v6.i10.794
Article Google Scholar
Sams CM, Dietsche EW, Swenson DW, DuPont GJ, Ayyala RS (2021) Pediatric panoramic radiography: techniques, artifacts, and interpretation. Radiographics 41(2):595–608. https://doi.org/10.1148/rg.2021200112
Article Google Scholar
Gupta A (2023) On imaging modalities for cephalometric analysis: a review. Multimed Tools Appl. https://doi.org/10.1007/s11042-023-14971-4
Article Google Scholar
Urban R et al (2023) AI-assisted CBCT data management in modern dental practice: benefits, limitations and innovations. ELECTRONICS. https://doi.org/10.3390/electronics12071710
Article Google Scholar
Kumar A, Bhadauria HS, Singh A (2021) Descriptive analysis of dental X-ray images using various practical methods: a review. PEERJ Comput Sci. https://doi.org/10.7717/peerj-cs.620
Article Google Scholar
Singh NK, Raza K (2022) Progress in deep learning-based dental and maxillofacial image analysis: a systematic review. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2022.116968
Article Google Scholar
Leo LM, Reddy TK (2021) Learning compact and discriminative hybrid neural network for dental caries classification. Microprocess Microsyst. https://doi.org/10.1016/j.micpro.2021.103836
Article Google Scholar
Shi J et al (2023) Semantic decomposition network with contrastive and structural constraints for dental plaque segmentation. IEEE Trans Med Imaging 42(4):935–946. https://doi.org/10.1109/TMI.2022.3221529
Article Google Scholar
Hou S, Zhou T, Liu Y, Dang P, Lu H, Shi H (2023) Teeth U-Net: a segmentation model of dental panoramic X-ray images for context semantics and contrast enhancement. Comput Biol Med. https://doi.org/10.1016/j.compbiomed.2022.106296
Article Google Scholar
Salih O, Duffy KJ (2022) The local ternary pattern encoder-decoder neural network for dental image segmentation. IET IMAGE Process 16(6):1520–1530. https://doi.org/10.1049/ipr2.12416
Article Google Scholar
Qaddoura R, Al Manaseer W, Abushariah MAM, Alshraideh MA (2020) Dental radiography segmentation using expectation-maximization clustering and grasshopper optimizer. Multimed Tools Appl 79(29–30):22027–22045. https://doi.org/10.1007/s11042-020-09014-1
Article Google Scholar
Mouzai M, Mustapha A, Bousmina Z, Keskas I, Farhi F (2023) Xray-Net: self-supervised pixel stretching approach to improve low-contrast medical imaging. Comput Electr Eng. https://doi.org/10.1016/j.compeleceng.2023.108859
Article Google Scholar
Shajahan M, Aris SAM, Usman S, Noor NM (2022) Denoising of impulse noise using partition- supported median, interpolation and DWT in dental X-ray images. Int J Adv Comput Sci Appl 13(9):274–280
Google Scholar
Khan S, Mukati A (2020) Dataset augmentation for machine learning applications of dental radiography. Int J Adv Comput Sci Appl 11(2):453–456
Google Scholar
Rashid U et al (2022) A hybrid mask RCNN-based tool to localize dental cavities from real-time mixed photographic images. PEERJ Comput Sci. https://doi.org/10.7717/peerj-cs.888
Article Google Scholar
Imak A, Celebi A, Siddique K, Turkoglu M, Sengur A, Salam I (2022) Dental caries detection using score-based multi-input deep convolutional neural network. IEEE Access 10:18320–18329. https://doi.org/10.1109/ACCESS.2022.3150358
Article Google Scholar
Singh P, Sehgal P (2021) G.V Black dental caries classification and preparation technique using optimal CNN-LSTM classifier. Multimed Tools Appl 80(4):5255–5272. https://doi.org/10.1007/s11042-020-09891-6
Article Google Scholar
Rajee MV, Mythili C (2023) Novel technique for caries detection using curvilinear semantic deep convolutional neural network. Multimed Tools Appl 82(7):10745–10762. https://doi.org/10.1007/s11042-022-13789-w
Article Google Scholar
Chawla R, Krishna KH, Deshmukh AA, Sagar KVD, Al Ansari MS, Taloba AI (2022) A hybrid optimization approach with deep learning technique for the classification of dental caries. Int J Adv Comput Sci Appl 13(12):339–347
Google Scholar
Lee J-H, Kim D-H, Jeong S-N, Choi S-H (2018) Detection and diagnosis of dental caries using a deep learning-based convolutional neural network algorithm. J Dent 77:106–111. https://doi.org/10.1016/j.jdent.2018.07.015
Article Google Scholar
Zhu Y et al (2022) Faster-RCNN based intelligent detection and localization of dental caries. Displays. https://doi.org/10.1016/j.displa.2022.102201
Article Google Scholar
Kohlakala A, Coetzer J, Bertels J, Vandermeulen D (2022) Deep learning-based dental implant recognition using synthetic X-ray images. Med Biol Eng Comput 60(10):2951–2968. https://doi.org/10.1007/s11517-022-02642-9
Article Google Scholar
Park J, Lee Y (2023) Oriented-tooth recognition using a five-axis object-detection approach. Appl Intell 53(9):9846–9857. https://doi.org/10.1007/s10489-022-03544-x
Article Google Scholar
Aparna S, Muppavaram K, Ramayanam CCV, Ramani KSS (2021) Mask RCNN with RESNET50 for dental filling detection. Int J Adv Comput Sci Appl 12(10):717–724
Google Scholar
Imak A, Celebi A, Turkoglu M, Sengur A (2022) Dental material detection based on faster regional convolutional neural networks and shape features. NEURAL Process Lett 54(3):2107–2126. https://doi.org/10.1007/s11063-021-10721-5
Article Google Scholar
Turhal UC (2022) Vegetation detection using vegetation indices algorithm supported by statistical machine learning. Environ Monit Assess. https://doi.org/10.1007/s10661-022-10425-w
Article Google Scholar
Koc M (2021) A novel partition selection method for modular face recognition approaches on occlusion problem. Mach Vis Appl. https://doi.org/10.1007/s00138-020-01156-4
Article Google Scholar
Ergin S, Isik S, Gulmezoglu MB (2021) Face recognition by using 2D orthogonal subspace projections. Trait Signal 38(1):51
Article Google Scholar
Koc M, Ergin S, Gülmezoğlu MB, Edizkan R, Barkana A (2020) Use of gradient and normal vectors for face recognition. IET Image Process 14(10):2121–2129
Article Google Scholar
Kalyoncu HB, Ergin S, Gulmezoglu MB (2020) Block-based noisy/clean classification of images using the common vector approach. CIRCUITS Syst SIGNAL Process 39(3):1387–1418. https://doi.org/10.1007/s00034-019-01199-7
Article Google Scholar
Kacar U, Kirci M (2019) ScoreNet: deep cascade score level fusion for unconstrained ear recognition. IET Biometrics 8(2):109–120
Article Google Scholar
Tuan TM et al (2018) Dental diagnosis from X-ray images: an expert system based on fuzzy computing. Biomed Signal Process Control 39:64–73
Article Google Scholar
Majanga V, Viriri S (2021) Automatic blob detection for dental caries. Appl Sci 11(19):9232
Article Google Scholar
Geetha V, Aprameya KS, Hinduja DM (2020) Dental caries diagnosis in digital radiographs using back-propagation neural network. Heal Inf Sci Syst 8:1–14
Google Scholar
Prajapati SA, Nagaraj R, Mitra S (2017) Classification of dental diseases using CNN and transfer learning. In: 2017 5th International symposium on computational and business intelligence (ISCBI), pp 70–74
Vasdev D et al (2022) Periapical dental X-ray image classification using deep neural networks. Ann Oper Res 326:161
Article Google Scholar
Chen IDS, Yang C-M, Chen M-J, Chen M-C, Weng R-M, Yeh C-H (2023) Deep learning-based recognition of periodontitis and dental caries in dental x-ray images. Bioengineering 10(8):911
Article Google Scholar
Santhi S, Chairman M (2022) Oral disease detection from dental X-Ray images using Densenet. In: 2022 4th International conference on inventive research in computing applications (ICIRCA), pp 1280–1286
Li X, Yu S, Lei Y, Li N, Yang B (2023) Intelligent machinery fault diagnosis with event-based camera. IEEE Trans Ind Inform 20:380
Article Google Scholar
Chen X, Li X, Yu S, Lei Y, Li N, Yang B (2024) Dynamic vision enabled contactless cross-domain machine fault diagnosis with neuromorphic computing. IEEE/CAA J Autom Sin 11(3):788–790
Article Google Scholar
Tan M, Le Q (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp 6105–6114
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv Preprint arXiv:1412.6980.

Download references

Funding

Open access funding provided by the Scientific and Technological Research Council of Türkiye (TÜBİTAK). The funding would be supported by Türkiye Bilimsel ve Teknolojik Araştırma Kurumu (TÜBİTAK) based on an agreement with the SPRINGER NATURE.

Author information

Authors and Affiliations

Department of Computer Engineering, Eskisehir Osmangazi University, Meselik Campus, 26489, Eskisehir, Turkey
Zuhal Can, Sahin Isik & Yildiray Anagun

Authors

Zuhal Can
View author publications
You can also search for this author in PubMed Google Scholar
Sahin Isik
View author publications
You can also search for this author in PubMed Google Scholar
Yildiray Anagun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zuhal Can.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest in this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Can, Z., Isik, S. & Anagun, Y. CVApool: using null-space of CNN weights for the tooth disease classification. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09995-2

Download citation

Received: 22 February 2024
Accepted: 09 May 2024
Published: 29 May 2024
DOI: https://doi.org/10.1007/s00521-024-09995-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

CVApool: using null-space of CNN weights for the tooth disease classification

Abstract

Similar content being viewed by others

Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda

Machine learning and deep learning approach for medical image analysis: diagnosis to detection

Medical image data augmentation: techniques, comparisons and interpretations

1 Introduction

2 Related work