Introduction

Breast cancer is a prevalent disease which poses a threat to the women worldwide. It ranks as one of the three most fatal diseases for women due to the high mortality rate [1,2,3]. According to World Health Organization (WHO), more than 2.26 million new patients and 685 thousand deaths were reported in 2020 [4]. Early diagnosis of breast cancer can reduce the mortality rate and limit the tissue damage [5]. Two main categories of breast tumors are observed: benign (non-cancerous) tumors and malignant (cancerous) tumors. These tumors can be examined through various non-invasive imaging techniques, such as mammograms, ultrasounds, computed tomography (CT), and magnetic resonance (MR). Ultrasonography, a non-invasive method, is one of the most popular imaging modalities as it is painless, cheap, and radiation free [6,7,8,9]. The tumor patterns and characteristics can be distinguished through analysis of ultrasound images. However, this is very time consuming and the number of patients continues to increase.

Computer-aided diagnosis (CAD) systems to diagnose breast cancer can therefore support radiologists by providing improved and swift interpretation. Both benign and malignant tumors are the result of abnormal growth of the breast tissue but benign tumors do not spread to other areas of the body, have a regular oval shape, and usually consist of a smooth lesion [10]. Malignant tumor has micro lobulated or angular margins, indicating irregular pattern, and can spread to other organs if left untreated [11, 12]. To develop an effective CAD system, it is crucial to incorporate the characteristics of these tumors into an automated analysis system. Generally, the CAD systems utilize two-dimensional (2D) ultrasound images of the breast tumors, which cannot give insight in relevant features like the tumor volume [13] and overall tumor surface structure. On the other hand, three-dimensional (3D) imaging techniques such as CT, MR, and mammography have disadvantages like a high radiation dose or financial cost. Moreover, unlike 3D instruments, 2D imaging instruments are widely available in hospitals [14].

Graph-based models utilizing the radiologists’ decision strategy have proven themselves promising and reliable for automated diagnosis-based applications [15]. Therefore, in this study, we propose a breast tumor classification approach with a graph attention network (GAT) utilizing different clinically meaningful features extracted from the 3D mesh which is reconstructed from the 2D breast tumor region of interest (ROI). A breast tumor dataset, consisting of two classes benign and malignant, is utilized. The 3D mesh is generated from the 2D breast tumor ROI utilizing the point-e system proposed by Nichol et al. [16]. Ten medically relevant mesh features are extracted from the 3D mesh: centroid distance, minimum and maximum distance of mesh bounding box, surface area, volume, change of curvature, sphericity, anisotropy, eigen entropy, and farthest distance. Subsequently, feature selection techniques, such as minimum redundancy maximum relevance (MRMR) and analysis of variance (ANOVA) test, are conducted to assess the contribution of the features to the mesh classification where all the features exhibited adequate performance. For a comprehensive understanding on how the features of different classes showcase meaningful pattern and impactful for decision-making, a feature pattern analysis is experimented. A feature table is generated with dimensions of 445 × 12 containing ten feature columns, a target column, and a column consisting of unique IDs for the rows. Each row describes the characteristics of the mesh. A graph is generated incorporating the relationships between the rows of the feature table. Considering the rows as nodes, the Spearman correlation coefficient is used to find the edge between two correlated nodes. Two nodes are connected through an edge if the correlation score is greater than or equal to 0.7, so that only strong connections between the nodes are included. This threshold is determined through an experiment, where the proposed model’s performance with various threshold values (\(\ge\) 0.5, \(\ge\) 0.6, \(\ge\) 0.7, \(\ge\) 0.8, \(\ge\) 0.9) is tested, and the model demonstrates its optimal performance with the 0.7 threshold value. The generated graph contains 56,054 edges and 445 nodes. An ablation study is conducted for seven variables, resulting in an optimized GAT model with the highest accuracy of 99.34%. Potential overfitting issues are assessed through k-fold cross validation. Results are presented through several performance matrices and statistical analyses. The performance of the proposed model is compared with ten machine learning models and one convolutional neural network (CNN) model. Finally, the performance of this study is compared with previous literature regarding the breast tumor classification. The major contributions of this study are listed below:

  • 3D meshes are generated from the 2D breast tumor ROI which is able to provide more informative representation of the tumor.

  • Instead of relying on the structure-based imaging feature, ten clinically relevant mesh features are extracted from the 3D mesh ROI.

  • To classify the breast tumors, a GAT model is proposed, introducing graph representation of the mesh features which contributes greatly on the classification performance.

  • To obtain a robust GAT model with higher accuracy, a seven-stage ablation study is performed; thus, the optimal configuration of the architecture is obtained.

  • To further evaluate the GAT model’s performance, overfitting issue and performance consistency are assessed through several experiments.

Related Work

In the field of early diagnosis of breast cancer, deep learning and machine learning approaches have made significant progress. Aljuaid et al. [17] employed a combination of deep neural networks (ResNet 18, ShuffleNet, and Inception-V3Net) and transfer learning on the publicly available BreakHis dataset to provide a computer-aided diagnosis technique for breast cancer categorization (both binary and multi-class). For both binary and multi-class classification, the ResNet model yielded the highest performance: 99.7%, and 97.81%, respectively. Chattopadhyay et al. [18] presented a state-of-the-art approach for classification of breast cancer where a dual-shuffle attention-guided deep learning model was developed. The proposed model included a channel attention mechanism. They evaluated the proposed model using the BreakHis dataset and obtained classification accuracies of 95.72%, 94.41%, 97.43%, and 98.1% for four different magnification levels, i.e., 40 × , 1000 × , 200 × , and 400 × , respectively. In the study of Abunasser et al. [19], breast cancer classification was carried out employing a transfer learning approach and a proposed CNN-based deep learning model, called BCCNN. The proposed model outperformed the transfer learning models with a classification accuracy of 98.28%. Jabeen et al. [20] proposed an automated system comprising image preprocessing, data augmentation, and deep learning for the classification of breast tumors. A pre-trained model, named EfficientNet-b0, was used to extract deep features and feature fusion was conducted utilizing a serial-based methodology. The experiments were conducted on two datasets, CBIS-DDSM and INbreast, with accuracies of 95.4% and 99.7%, respectively. Obayya et al. [21] developed a mathematical optimization algorithm with a deep learning-based method for classifying breast cancer using histopathological images (AOADL-HBCC). Utilizing image preprocessing, the proposed AOADL-HBCC technique achieved a maximum accuracy of 96.77%. In addition to traditional approaches, graph convolution networks with graph representation have been applied in the detection of breast cancer. Following this strategy, Yao et al. [22] created a model using a graph convolution approach that focuses on automatic identification and classification of calcification distribution patterns in mammographic images. The accuracy of the proposed model was 64.3%, and the area under the curve was 0.74%. An alternative method for classification of diseases is to create a point cloud from a 2D image and then generate a mesh from it, as this offers valuable information that can help with diagnosis. Sun et al. [23] developed a point cloud network LatentPCN which was used to reconstruct 3D surface models from calibrated biplanar X-ray data. LatentPCN transfers sparse silhouettes made from 2D images to a latent representation and this is used as the input to the decoder to create a 3D bone surface model. To assess the effectiveness of LatentPCN, experiments on two datasets were carried out. The mean reconstruction errors obtained by LatentPCN on these two datasets were 0.83 mm and 0.92 mm, respectively. Jia et al. [24] presented a pixel-wise sparse graph reasoning (PSGR) module for the segmentation of COVID-19-infected regions in CT images. The results show that the suggested PSGR module can successfully capture long-range dependencies, and the segmentation model can accurately segment COVID-19-infected regions. Liu et al. [25] presented a MIcrobial Cancer-association Analysis utilizing a Heterogeneous graph transformer (MICAH) to identify intra-tumoral cancer-associated microbial communities. Microbes’ phylogenetic and metabolic links were combined by the MICAH into a heterogeneous graph representation. The interaction between the intra-tumoral bacteria and cancer tissues was captured holistically using a graph attention transformer. Maken and Gupta [14] analyzed the computational techniques, specifications, and procedures for 3D reconstruction. They recommended further research of 3D reconstruction for X-ray images. Laumer et al. [26] proposed an automated approach which can infer a high-resolution, individualized 4D (3D plus time) surface mesh of the heart structures from 2D echocardiography video data. The inference of such shape models is a critical first step towards a tailored simulation that permits automatic evaluation of the heart chambers’ morphology and function. Hu et al. [27] developed a vision graph neural networks (ViG)-based pipeline that can classify breast ultrasound images. The accuracy of the ViG model was 100.00% for binary classification and 87.18% for multiclass classification. Ma et al. [28] presented a deep learning framework to store domain-specific transformations between the contours of the face and the bones for orthognathic surgery planning. To forecast the changes in the face and skeletal morphology, a bidirectional point-to-point convolutional network (P2P-Conv) was used. Their method outperformed state-of-the-art algorithms in terms of predicting face and skeletal shapes, according to experimental results with real-subject data.

Transforming 2D images into 3D representations and utilizing mesh representations is a novel and promising strategy in the field of disease detection and classification. The concept was introduced in several CAD-based studies; however, to the best of our knowledge, no one has yet attempted to classify breast ultrasound images with this strategy. Traditionally, breast ultrasound images analysis is conducted in 2D. This has drawbacks because it only shows the breast tissue slice by slice, thus missing out on important spatial information. A more comprehensive understanding of the structure of breast tissue can be attained by transforming 2D images into 3D representations, opening up new possibilities for improved detection and classification. By converting traditional 2D imaging data into a more comprehensive and informative 3D format, this cutting-edge method presents a novel viewpoint. In this study, the concept is utilized with graph representation of clinically relevant mesh features of ultrasound images. In most of the previous studies, 2D imaging features were employed with a classification approach using traditional machine learning methods and the clinical relevance was not further investigated. This study also differs from previous research by incorporating graph model-based classification based on established node to edge relationships.

Materials and Methods

Dataset

A publicly available dataset [29] is used in this study. It includes 780 breast ultrasound images of 600 female patient’s aged between 25 and 75 years. The images are in PNG format and categorized into benign, malignant, and normal classes, containing 437, 210, and 133 images, respectively. Ground truth masks for the ROIs were provided for the benign and malignant classes. As this study is focused on breast tumor classification, the 647 ultrasound images for the benign and malignant classes are used. Sample images of benign and malignant images with their respective masks are shown in Fig. 1.

Fig. 1
figure 1

Samples of breast ultrasound images. a Image of a benign tumor and b image of a malignant tumor with their corresponding ground truths

Methodology

The methodology pipeline of this study is shown in Fig. 2.

Fig. 2
figure 2

Overview of the methodology pipeline for breast tumor classification (A: dataset and ROI extraction, B: mesh generation, C: mesh filtering and mesh feature extraction, D: feature selection and graph generation, E: model selection, F: analysis of results)

The first step (A) is tumor ROI segmentation, using a bitwise AND operation between the raw images and the provided ground truth masks. This ensures that only relevant parts are taken into consideration for further analysis. The objective of step B is to create a 3D mesh from the tumor ROI. For this task, the segmented images are fed into the Base40M model, which creates a dense point cloud of the ROI using an additional point cloud sampler model. The mesh is then generated from a regression-based model and a marching cube algorithm. In step C, a filtering process is carried out, taking into consideration that some meshes may differ significantly from their 2D image because of the lack of accurate depth information in the raw images. To ensure higher quality 3D meshes of the tumor ROI, images are selected by hand and only the finest meshes are selected. After filtering, 294 images of benign and 151 images of malignant tumors are selected out of 437 benign and 210 malignant images, respectively. Then, ten meaningful features are extracted from the filtered mesh dataset. These features provide information regarding the characteristics of the tissues. In step D, two methods are used for feature selection: ANOVA and MRMR. These techniques enable the identification of the most relevant and informative features, in order to ensure correctness and reliability of further analysis. Then, a feature pattern analysis is done to understand the distinctive characteristics of each class even better. After establishing correlations between the chosen features using Spearman correlation threshold 0.7, a graph is constructed, resulting in 445 nodes and 56,054 edges. In step E, a GAT model is developed and the graph is fed into the model. To enhance the model’s performance, an ablation study is conducted with seven parameters: the activation function, hidden unit, number of layers, learning rate, batch size, momentum, and number of heads. In the final step, F, a comprehensive analysis of the results is done. A confusion matrix, k-fold cross validation, statistical analysis, and performance comparison with ten machine learning models, as well as a comparison with previous studies, are presented. By using this methodology, this study provides an analysis of breast ultrasound classification and may provide insight and expand our understanding of the features and abnormalities of breast tissue.

Mesh Dataset Generation Using Point-e Network

The breast tumor ROI is extracted using a bitwise AND operation between the ultrasound image and the ground truth. With the breast tumor ROI’s, a mesh dataset is generated by the point-e system. The mesh is generated based on the RGB point cloud of the 2D breast ROI using a regression-based approach. A stack of diffusion models in accordance to the study of Nichol et al. [16] is used for the conversion of 2D image to a 3D point cloud. Figure 3 shows the mesh generation procedure from 2D tumor ROI images.

Fig. 3
figure 3

Mesh generation from the 2D breast ROI

The 40 M diffusive model is utilized for the point cloud generation of the breast ROI. The image is fed into a frozen pretrained contrastive language-image pre-training (CLIP) model, named ViT-L/14 CLIP, a large vision transformer model consisting of 14 layers, for the generation of the point cloud. The output of the ViT-L/14 CLIP model is then fed into a transformer-based model which makes a prediction of a set of possible conditions and noised point cloud.

To deal with noisy and non-uniform point clouds, generated through the transformer, an additional up-sampling model is used to get a uniform, high-resolution, and dense point cloud. The architecture of the up-sampling model is same as the architecture of the base 40 M diffusive model. The base model first generates a point cloud at low resolution, of 1 k points. The 40 M diffusive model is conditioned based on the low-resolution point cloud, serving as an up-sampling model. So, the output point clouds of base diffusive model are then up sampled with the up-sampling model, generating 3 k points in addition. Thus, a dense uniform point cloud object is constructed with 4 k points. The objective of the base diffusive model is to only generate low-resolution point cloud that will hold an overall structure of the given input. Subsequently, a high resolution-based point cloud is generated through the up-sampling model with the structural information obtained from low-resolution point cloud [16, 30]. Then, utilizing a regression-based model, the signed distance field (SDF) of the point clouds are calculated and a marching cube algorithm extracts the meshes from the points. The regression-based model is basically a regression forest-based method to predict the location of a grid point within a 3D space that is essential to compute the SDF. The signed distance estimates each point distance of the point cloud from the 3D surface boundary. A positive SDF value defines the point is outside of the 3D surface boundary and a negative SDG means the point is inside the 3D surface boundary. The SDF for a point p can be calculated from following equations [31]:

$${\pi }_{s}\left(p\right)=arg{ min}_{{p}{\prime}}|{p}{\prime}-p|$$
(1)
$${\Delta }_{s}\left(p\right)=p-{\pi }_{s}\left(p\right)$$
(2)
$${d}_{s}\left(p\right)=|{\Delta }_{s}\left(p\right)|$$
(3)
$${s}_{s}\left(p\right)={d}_{s}\left(p\right)sgn({\Delta }_{s}\left(p\right)\bullet {n}_{s}\left({\pi }_{s}\left(p\right)\right))$$
(4)

Here, \({\pi }_{s}\left(p\right)\) denotes the projection (location within a 3D space) of the point p to the surface s. The projection difference between the point p and its projection \({\pi }_{s}\left(p\right)\) is outlined by \({\Delta }_{s}\left(p\right)\). Then, the unsigned distance \({d}_{s}\left(p\right)\), for the point p, is calculated for the surface s. Lastly, the signed distance \({s}_{s}\left(p\right)\) is estimated by evaluating the agreement between the \({\Delta }_{s}\left(p\right)\) and the normal at p that is projected onto the surface, represented as \({n}_{s}\left({\pi }_{s}\left(p\right)\right)\). After calculating the signed distance for each point of the point cloud, the process of 3D surface reconstruction, also known as isosurface, is carried out by the marching cube algorithm. The marching cube algorithm generates triangles of the signed distance point in a voxel grid to approximate the target isosurface [32]. Finally, the 3D point cloud obtained from the 2D image is converted into a solid mesh.

Speckle noise in ultrasonic images reduces tissue boundaries, produces deceptive structures, and makes it difficult to segment and classify them. For which the tumor’s depth, edges, and pixel distributions cannot be identified in some cases [33]. Constructing a 3D view from an image that lacks comprehensive shape and edge information is a complex task as it needs to predict the incomplete parts effectively [34,35,36]. The distortion caused by speckle noise may lead to difficulties in generating a coherent 3D representation and become a potential cause of poor and sparse point cloud generation. Furthermore, estimating mesh from such sparse and noisy point cloud is challenging. A noisy point cloud has more likelihood to adversely impact the accurate computation of the SDF [37, 38] that cannot represent the actual object surface and disrupt the mesh generation process, resulting poor mesh structures like mesh holes (the unwanted gaps in the surface of a triangular mesh). These mesh holes affect the actual topology structure of the tumor mesh and mislead the breast tumor’s classification task [38]. In order to identify the deceptive structured meshes, an experiment is conducted, quantifying the percentage of mesh holes presented in a mesh. The quality of medical data can significantly impact on the diagnosis process [39, 40]. As this study diagnoses the breast tumor with medically significant mesh features, it is considered essential to work with high-quality meshes. So, meshes with more than 10% of their structure compromised by mesh holes are considered bad tumor meshes and excluded from the further diagnosis process. This criterion is consistently applied to all meshes during the mesh removal process, ensuring transparency and consistency in the identification of meshes with structural compromises. This criterion upholds a high standard of mesh quality, reducing the probability of uncertainties due to structural compromise that could potentially affect the precision of the breast tumor diagnosis. The investigation seeks to enhance the robustness of the diagnostic process by using this exclusion threshold, emphasizing the significance of meticulous attention to mesh quality in medical imaging applications.

Hence, after generating the mesh dataset for 647 images, some images are removed due to the presence of deceptive structures, resulting in a dataset of 445 meshes of 294 benign tumors and 151 malignant tumors. This updated mesh dataset is used in the further classification procedures. Some sample mesh structures generated from the tumor ROI are shown in Fig. 4.

Fig. 4
figure 4

Tumor ROIs with their corresponding meshes

Feature Extraction

Clinically relevant features are superior for the classification of breast tumors. Benign and malignant tumors differ in shape, structure, volume, and tumor orientation pattern [41]. We propose a set of 10 mesh features to differentiate malignant and benign characteristics. The features are separated into two categories, structural-based and distance-based.

Structural-Based Features

The structural-based mesh features include curvature, volume, surface area, eigen entropy, anisotropy, and sphericity. Most of the mesh features are calculated based on eigenvalues. The covariance features of anisotropy, centroid, and eigen entropy of the tumor ROI mesh tensor can be obtained from the normalized eigenvalues (\({\lambda }_{i}\)). The covariances for the mesh triangle are defined as \({\lambda }_{1}\), \({\lambda }_{2}\), and \({\lambda }_{3}\) and the order is \({\lambda }_{1}\)\({\lambda }_{2}\ge {\lambda }_{3}\)  ≥ 0 [42, 43]. A description of the structural-based features is given in Table 1.

Table 1 The structural-based mesh features

The malignant tumor structure often has spiculated margin and is non-uniform while the benign tumor’s structure has a smooth and consistent pattern [10, 11]. The structural mesh features usually have a higher value for malignant tumors.

Distance-Based Features

The distance-based mesh features are based on the distance between two mesh points. Minimum distance of the bounding box, maximum distance of the bounding box, farthest distance, and centroid distance are considered distance-based features. The minimum bounding box distance (BB min distance) and maximum bounding box distance (BB max distance) represent the shortest and longest distance between points of the mesh and the bounding box. The farthest distance represents the maximum distance between two vertices of the mesh. The centroid distance refers to the Euclidean distance between the center of the mesh object and the mesh points. The centroid represents the central location of all the mesh faces. Because malignant tumors have curvy, irregular, and non-uniform structure, the values of distance-based features are usually higher for the mesh of a malignant tumor. Since the benign tumor mesh is smooth and uniform, and angles between the mesh triangles are minimal, the distance-based scores are lower for benign tumors except the centroid distance feature. Figure 5 shows a visualization of mesh feature analysis with the bounding box and the curvature.

Fig. 5
figure 5

Visualization of the mesh with bounding box and curvature

Feature Pattern Analysis

In this experiment, a feature pattern analysis is focused with the objective to compare the differences between the feature values of benign and malignant classes. The analysis is carried out by individually computing the mean of features for each class. Table 2 represents the outputs of feature pattern analysis.

Table 2 Feature pattern analysis (individual and mean feature value)

In Table 2, five random mesh features from malignant and benign classes are shown to examine the actual distribution pattern of their features and emphasize the differences. Then, the average (μ) values of each feature for both classes are computed. The process involves six different cases, where in first five scenarios, 70, 90, 110, 130, and 150 randomly selected feature values are taken and the average values are counted for each scenario. Finally, in the last case, the average is calculated considering all the mesh feature values of each class, respectively. By analyzing the average values derived from each case study, a difference is found for malignant and benign features which can cluster them into two groups according to the feature value distribution patterns.

It has been observed that the feature values of sphericity for malignant is much higher compared to benign tumors. As benign tumors are mostly round or oval shaped [47], they have a higher likelihood of closely resembling a sphere and maintain a lower sphericity value. As a result, the average sphericity values for benign tumors are approximately half that of malignant tumors. Besides, the values of the anisotropy and eigen entropy features are notably higher in malignant tumors than in benign tumors. This disparity denotes the presence of non-uniformity in malignant tumors and an explicit uniformity in benign tumors. The curvature, farthest distance, BB min distance, and BB max distance features show greater values for malignant tumor than benign. Conversely, the values of centroid distance are higher for benign as they contain compact cell arrangement [48] and lead a larger centroid distance while the scattered cell pattern and heterogeneity [49] of malignant tumors lead smaller centroid distance value. These outcomes evidenced the pattern complexity of malignant tumors that contain poorly structured and curvy lesion. Tumor size can be a prominent discriminator between benign and malignant [50]. Analyzing the volume and surface area features, it is obvious that malignant tumors are greater in shape than benign. The discussion aligns with the observation of feature pattern analysis. Thus, the experiment is able to discover important insights into the distinctive feature patterns of both groups.

Feature Selection

Feature extraction is used to obtain the essential markers of an object, whereas feature selection selects the most informative and relevant features for the classification task. Using these features, a cost-effective and fast system performance can be achieved, while reducing overfitting and enhancing interpretability [51]. A univariate feature selection method, ANOVA, and a multivariant feature selection method, MRMR, are applied in this study to assess the significance of the separate features, filter out the irrelevant features and get the optimal feature combination, and evaluate the differences between the two classes [52, 53]. The result of the ANOVA test is given in Table 3.

Table 3 Result analysis of ANOVA test

A difference between separated groups is considered significant if the p value is less than 0.5 [54]. From Table 3, we can see that all p values are extremely low, indicating a large difference between the compared groups and validating that these features are suitable for the classification task.

The MRMR feature selection technique is used to rank the features based on their relevance for the target groups [55]. Figure 6 shows the MRMR ranking for the mesh features.

Fig. 6
figure 6

MRMR ranking the mesh features based on their relevance with target groups

All features score more than 50%, see Fig. 6, revealing a strong correlation between the features and the target groups. All features perform well for both ANOVA and MRMR, verifying themselves appropriate for classifying the breast tumor mesh.

Graph Dataset Generation

A graph dataset is generated from the feature table to understand the profound semantic structure among meshes within the same classes. The dimensions of the feature table are 445 × 12, with ten feature columns, one target column, and one column consisting of a unique ID for each individual row. A row is a characteristic representation of a specific tumor mesh and the column unique ID of each row is only used to assign node numbers and identify graph nodes. The rows are considered graph nodes and the relationship between the rows are considered as graph edges. For instance, the relationship between first raw (unique ID = 1) and second row (unique ID = 2) is determined where the edge will be drawn from the source node “1” to the target node “2.” The Spearman correlation coefficient is calculated to estimate the relationships between rows excluding the target column and unique ID column as they do not contain any information of a specific mesh feature. This results in a connected graph dataset where all nodes are linked. A graph is generated based on a 0.7 correlation threshold which means that if the correlation score is equal or above 0.7, an edge (\({e}_{k};\mathrm{ where} k\in \mathrm{1,2},\mathrm{3,4},\dots N)\) is constructed between two nodes (\({n}_{i},{n}_{j};\mathrm{ where} i,j\in \{\mathrm{1,2},\mathrm{3,4},5,\dots 445\} {\text{and}} i\ne j)\). Table 4 shows the correlation pattern of the graph dataset.

Table 4 Graph dataset containing source node, target node, and correlation between two nodes

The graph dataset in Table 4 displays the source and target nodes along with their correlation scores. The dataset outlines the procedure followed to generate the edge based on the value of the Spearman correlation coefficient. The table demonstrates that edge “1” links nodes “1” and “2” with a correlation score of 0.97, presenting a strong relationship between the nodes. The graph dataset of Table 4 reflects the connection between meshes offering a powerful representation of the degrees of similarity and dissimilarity that can fluctuate in accordance to their ten key feature values and corresponding classes. It is observed that 56,054 edges are created among 445 nodes, where 11,325 edges are built between nodes of malignant class and 42,778 edges are generated between nodes of benign class, indicating the presence of complex and substantial connections between nodes. Remaining 1951 edges are generated between benign and malignant classes because certain benign and malignant breast tumor lesions may exhibit high degree of resemblance [56]. This dataset not only represents the structural relationships between meshes but also captures the nature and interactions of these relationships. It provides a comprehensive topological representation that preserves strong neighboring relations [57]. Thus, the flexibility for in-depth analysis can be achieved through the utilization of a graph data structure, which enables to explore the detailed underlying relationships. This approach facilitates a more profound understanding of the interactions between different features and classes, enhancing the efficacy of classification algorithms.

Model

There have been several attempts to adapt neural networks to deal with arbitrarily structured graphs. Sequence-based approached nowadays make use of attention processes [58]. One advantage of the attention mechanisms is that they make it possible to handle inputs of different sizes while concentrating on the information that is most important for making decisions. Self-attention and intra-attention are terms used to describe an attention process that computes a representation of a single sequence [59]. In this classification approach of breast ultrasound image using 3D mesh features, a graph-based GAT model is introduced and this model is optimized with ablation study where seven parameters of the model are modified. The ablation study results can be found in Table 5.

Graph Attention Layer

The building block layer used to create arbitrary graph attention networks is described in this section, along with its theoretical and practical advantages and potential applications in neural graph processing.

This section will outline the GAT model’s single preprocessing layer for the graph attention mechanism. The entire processing method is described in Fig. 7.

Fig. 7
figure 7

Mechanism graph of attention layer processing

The input layer is the set of nodes which represent the features. Nodes \(\left\{X=X1,X2,X3\dots \dots {X}_{n}\right\}, {X}_{i}\in R\) where \(X\) refers to the feature values and R represents all real numbers. The edges can be represented as \({e}_{ij}=a(W{X}_{i},W{X}_{j})\) where W is the weight of the edge and \({X}_{i},{X}_{j}\) are the relational nodes. \(W{X}_{i}\) and \(W{X}_{j}\) are the node weights which are concatenated by the SoftMax function as \({\alpha }_{ij}\) [59]. The equation is:

$${\alpha }_{ij}=Softmax\left({e}_{ij}\right)=\frac{{\text{exp}}({e}_{ij})}{\sum_{k\in {N}_{i}}{\text{exp}}({e}_{ik})}$$
(5)

An attention score is computed using a set of concatenation node weights. The Leaky ReLU activation function, a variant of the ReLU activation function, is used in the calculation of the attention score. After calculating the attention scores for all nodes, they are summed using a mathematical function called “math.bin Count.” To ensure that the attention scores are in a consistent range and avoid issues with scale, the attention scores are normalized during further preprocessing. Moreover, the next step is applying the normalized attention scores to all nodes. This procedure effectively emphasizes specific nodes based on their significance by multiplying each node’s concatenated weights by its associated normalized attention score. A multi-head attention layer is then applied as the final layer of preprocessing. The computation of multiple sets of attention weights, each referred to as a “head,” constitutes the multi-head attention mechanism. These attention heads provide the opportunity to learn various patterns by capturing different interactions between the nodes. Concatenating the outputs of these attention heads creates a final layer output that combines the information gained from different characteristics of the input nodes.

GAT

Our GAT model is a combination of the two dense layers and two multi-head graph attention layers based on the multi-head graph attention mechanism. Figure 8 shows the optimized architecture of GAT model after ablation study.

Fig. 8
figure 8

Proposed model architecture after ablation study

The proposed model utilizes the features and transforms them into a complex graph representation. The nodes, edges, and their weight all combine to create the input for our model. After node edge connection, the nodes are connected with the dense layer and the edges are connected with the attention later. A dense layer with 4096 parameters, a multi-headed graph attention layer with 263,168 parameters, and a final dense layer with 1026 parameters are part of each of the three blocks. An ablation study is being carried out to optimize the model, investigating the effects of various elements on the performance.

One-Dimensional CNN Model

Sequential data analysis utilizes a one-dimensional (1D) CNN model. Convolutional layers, pooling layers, and one or more fully linked layers are all components of the 1D CNN architecture. In order to extract relevant features from the input data, the convolutional layers incorporate a number of filters, and to reduce the number of parameters in the network, the pooling layers down sample the output of the convolutional layers [60]. The fully linked layers generate predictions using the obtained features.

Results

Our proposed model is evaluated using several performance metrics, training accuracy, test accuracy, precision, recall, F1 score, negative predictive value (NPV), false positive rate (FPR), false discovery rate (FDR), false negative rate (FNR), and Matthew’s correlation coefficient (MCC). These metrics are calculated based on the true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) values acquired from the confusion matrix. An ablation study is done and the performance of the model is compared with previous literature and ten machine learning models.

Ablation Study

Seven ablation experiments are carried out to optimize the performance of the proposed GAT model. In each stage, the configuration resulting in the highest test accuracy is chosen for further experimentation. The outcome of the seven ablation experiments is listed in Table 5.

Table 5 Ablation study for GAT model optimization

In the first experiment, three different activation functions, exponential linear unit (elu), rectified linear unit (relu), and hyperbolic tangent function (tanh), are used and the highest accuracy of 96.71% is attained with the relu activation function. Next, two types of hidden units [32] and [64] are experimented with and a test accuracy of 97.37% is achieved with the [32] hidden unit. In the third experiment, the model is tested with 2, 3, and 4 attention layers and the best accuracy, of 98.03%, is obtained with four layers. Three learning rates are experimented with resulting in a test accuracy of 98.68% with a learning rate of 0.000001. Different combinations of the batch size, momentum, and number of heads are experimented with. The highest accuracy of 99.34% is obtained with a batch size of 64, a momentum of 0.9, and 8 heads in 45 s. The results of the ablation study indicate that the model achieves the highest accuracy with activation function: relu, a hidden unit of [32], 4 attention layers, a learning rate of 0.000001, a batch size of 64, a momentum of 0.9, and 8 heads.

GAT Model’s Performance at Different Correlation Threshold

The initial graph for the feature Table (445 × 10), excluding unique ID and target column, contains 98,790 node connections. Such complex and large dataset has a potential to limit the model’s performance as it can consist of noisy and redundant information [61]. To identify the optimal correlation threshold, an experiment is carried out, as shown in Table 6, to obtain a graph that captures a significant number of node connections.

Table 6 Proposed model’s performance at different thresholds

The Spearman correlation value determines how strongly two nodes are connected. The GAT model gives an accuracy of 96.71% for both cases of considering all edges and edges with a correlation value of 0.5 or higher, highlighting the impact of the presence of irrelevant edges on the model’s performance. The model has an increasing accuracy of 97.37% for threshold values ≥ 0.6. For the thresholds ≥ 0.8 and ≥ 0.9, the model obtains 98.68% accuracy, as there may present lesser number of redundant edges. This study thus chooses a threshold value of ≥ 0.7 which proves itself an optimal threshold, giving the highest accuracy of 99.34%.

Performance Analysis of Proposed GAT Model

Several performance metrics, including precision, specificity, sensitivity, NPV, FPR, FDR, FNR, accuracy, F1 score, and MCC, have been calculated to assess the performance of the proposed model. Figure 9 shows the confusion matrix of the model.

Fig. 9
figure 9

Confusion matrix of proposed model

The performance metrics are computed using the confusion matrix’s values for TP, TN, FP, and FN. The results are shown in Table 7.

Table 7 Evaluation matrix of the GAT model

The model’s performance metrics are shown in Table 7, where the highest test accuracy of 99.34% is emphasized. Apart from its high accuracy, the model also performs well in terms of other measures: sensitivity, precision, specificity, F1 score, and MCC are all above 97%. The loss and accuracy curves of the model are shown in Fig. 10.

Fig. 10
figure 10

Loss and accuracy curve of the proposed GNN model

Although there is some variability in the validation accuracy and loss curves, overfitting is not significantly present. This indicates that the proposed model is robust and stable.

k-Fold Cross Validation

In k-fold cross validation, the data is shuffled randomly and split into k-groups. In the five-fold cross validation, the dataset is split into five subsets, where four subsets are used for model training and validation is done based on the fifth subset. Using k value 5, a five-fold cross validation is shown in Fig. 11.

Fig. 11
figure 11

Five-fold cross validation for performance analysis

It can be observed that, across all folds and iterations in Fig. 11, the GAT model obtains a validation accuracy above 97%, demonstrating its reliable performance in the classification task. This outcome implies that the model is robust enough to operate without overfitting issues.

Performance Comparison with ML and CNN Models

We have conducted a comparison between our GAT model and ten machine learning models: random forest classifier (RFC), gradient boosting classifier (GBC), extra trees classifier (ETC), bagging classifier (BC), support vector machine (SVM), support vector classifier (SVC), decision tree classifier (DTC), Gaussian naive bias (GNB), k-nearest neighbors (KNN) classifier, logistic regression (LR), and an 1D CNN model. All of these models utilize the same hand-crafted feature set (445 × 11), used for the proposed GAT model’s training and testing purpose. In this case, the “unique ID” column of the feature set (445 × 12) is excluded as it does not contain any mesh-related feature values. Therefore, the models are evaluated on the same subset of data, ensuring consistency in the input of the models throughout the comparison. The results of this comparison are presented in Table 8. This comparison enables us to evaluate the effectiveness and potential superiority of our GAT model compared to the other ML models.

Table 8 Comparison with proposed GAT model

The results presented in Table 8 indicate that, in comparison to our proposed model, the performance of ML models and 1D CNN model is significantly lower. The 1D CNN model achieves an accuracy of 87.64%, while the accuracy of ML models ranges from 73 to 91%. The substantial performance difference between the proposed model and traditional ML models can be attributed to several factors. ML models often lack the capacity to comprehend the intricate data relationships in small datasets with limited details [62]. Due to the confinement, it can be challenging to obtain consistent data patterns during the training process, which may result in issues such as overfitting, underfitting, and ultimately lower performance [63, 64].

On the other hand, with the graph data representation, the proposed GAT model can significantly understand the intricate data patterns. As described in the “Graph Dataset Generation” section, graph data structures are effective for demonstrating the complex relationships that exist between various data points [65], which is particularly beneficial when working with small datasets. The proposed GAT model utilizes the complex graph data structure and avoids being influenced by noise or irrelevant features by discovering the most important neighborhood connection and focusing on relevant information to gain the remarkable performance.

The results demonstrate that the proposed model outperforms all ML models and the 1D CNN model, indicating its superior ability to learn and make accurate predictions by comprehending data relationships within a graph format as capturing significant patterns is crucial for achieving high performance.

Comparison with Previous Studies

In this section, a comparison table is presented which includes the key information of each relevant research study compared to our methodology. Authors, year of publication, methodology, and results are shown in Table 9.

Table 9 Comparison with previous literature

From Table 9, it can be observed that the accuracies of prior studies are between 91 and 97%. The studies are performed with 2D images, where important information such as volume and curvature of breast tumors cannot be assessed. Most authors used CNN and deep learning models, which means that the time complexity of their classification task is much higher. Consequently, our proposed approach demonstrates efficiency and reliability with higher test accuracy and lower computational complexity.

Discussion

A novel method for classifying diseases is to create a point cloud from a 2D image and then generate a 3D mesh. This may provide more valuable information that can help significantly with precise diagnosis. This study involves a novel approach in the classification of breast cancer using ultrasound images. In this regard, a 2D image is converted into a 3D representation, enabling an in-depth analysis of the underlying structures and features of breast tumor. 3D mesh features are extracted to describe the tumor pattern more precisely. The features are clinically relevant representing tumor characteristic in a more informative way and are used for graph-based classification with a node-to-node relationship. The classification using the GAT model allows to ingrate the graph representation of the features which eventually contributes on increasing the classification performance. In future study, we aim to utilize a larger breast cancer dataset. The proposed approach can be evaluated using a real-world dataset. In this way, an automated method can be proposed which can assist the radiologist in the real-world diagnosis system. In addition, some other classes of breast cancer can be explored to discover more in-depth knowledge regarding tumor pattern. In this way, an automated analysis of breast cancer progression can be conducted.

Conclusion

In this paper, we have presented a classification of breast tumors from ultrasound images by applying 3D mesh reconstruction from breast ROI 2D image. The mesh is generated employing point-e network where a point cloud is generated from the 2D image. Utilizing the dense point cloud, a 3D triangular mesh is generated. In order to obtain useful information for the classification task, ten mesh features are extracted. In addition, two feature selection techniques, ANOVA and MRMR, are done to determine the features that strongly influence the prediction of the target class. It is found that all ten features are robust. Furthermore, with the ten features, a graph dataset is generated. Then, the graph dataset is fed into the GAT model, resulting in an accuracy of 97.1% was achieved. To enhance the GAT model’s performance, an ablation study is conducted experimenting with seven parameters. The modified GAT model records an accuracy of 99.34%, which validates that the model is robust enough to classify meshes of breast ROI. To conclude, this study demonstrates that the impact of 3D mesh, appropriate feature extraction, and graph-based model can greatly revolutionize the breast tumor classification task.