1 Introduction

Cervical cancer poses a significant health concern in women and has been recognized as the fourth leading cause of female mortality [1]. According to the World Health Organization (WHO) report, the number of affected patients and deaths is increasing alarmingly. In 2020, more than 60,400 new cases have been reported, and the number of deaths reached up to 34,200 [2]. The infection of the Human papillomavirus (HPV) in the cervical cell located at the bottom of the uterus is the primary reason for this disease [3]. Additionally, having multiple sex partners and the initiation of sexual activity at an early stage can increase the risks of affecting with cervical cancer [4]. However, early treatment, such as vaccination, radiotherapy, and surgery, can efficiently increase the patient’s survival rate. Several developed nations have introduced various cytology screening techniques to decrease the mortality rate associated with cervical cancer [5]. According to the World Health Organization, regular screening at ten-year intervals has the potential to reduce the occurrence of cervical cancer by 65% [6]. However, manual analysis by medical experts often takes longer and requires extreme effort for accurate screening due to the increasing number of positive cases. The limitation of medical practitioners, resources, and remoteness is a major concern that adversely affects the outcome of treatment. Computer-aided systems can substantially lessen these challenges and help professionals treat patients more effectively. Moreover, it minimizes the error rate, analyzes the pap smear images, and develops an efficient and effective diagnosis tool.

Over the last few years, the combination of advanced machine learning, deep learning, and image processing techniques has facilitated the automation of cervical cancer diagnosis using pap smear images of cervical cells [7,8,9]. Researchers worldwide use these advanced techniques to examine the cervical cell and segment the nucleus and cytoplasm from the cell to detect the abnormality of the cervical cell [10, 11]. In the last ten years, several studies have adopted different machine-learning models to classify cervical cancer cells using clinical data [12]. Moreover, the neural network-based model is widely used for its effective predictive capability. Proper feature analysis or discovering the impact of the features and their proper visualization can be a more practical approach to analyzing medical images. The traditional diagnosis tool for cervical cancer comprises three individual techniques: segmentation, feature extraction, and classification [7]. Segmentation includes the cytoplasm and nucleus separation from the pap smear images. Then, the handcrafted features, including texture, morphological, and intensity-based features, are extracted from the segmented area. And finally, the classification layer classifies the abnormalities. Including all these techniques substantially improves the efficacy of the diagnostic tool.

This study proposes an automated computer-aided approach for diagnosing cervical cancer by investigating the impactful features of histopathological images. This study aims to build a computerized system using clinically significant features containing the cytological and nucleus characteristics. A graph is constructed using the relationship among the important features, and a graph convolutional network (GCN) architecture is developed to perform the classification task. The cervical Pap smear image clearly distinguishes between healthy and abnormal cell distributions in color and shape. As a result, the handcrafted features containing the cells’ texture, color, shape, or uniformity properties are retrieved and formulated in the tabular dataset for further graph construction. The proposed framework is evaluated on a public dataset and obtains satisfactory outcomes. Moreover, an extensive result analysis is performed to assess the effectiveness of the proposed approach.

The contribution of this paper is summarized below:

  1. i.

    Segmentation of the multi-cell nucleus and cytoplasm is performed concurrently from histopathological images using several widely used image preprocessing methods.

  2. ii.

    26 handcrafted features are extracted from the segmented nucleus and cytoplasm.

  3. iii.

    A broad feature analysis is performed to identify the significant features, and a feature evaluation is performed to find the pattern among them.

  4. iv.

    A graph is constructed using the strongest relationship among the features. To find the relationship, we used the Pearson correlation method. The feature columns are assigned to the nodes, and the edges are determined between those nodes having a positive correlation score over 0.5.

  5. v.

    A GCN architecture is implemented to classify the classification task into five classes: dyskeratotic, superficial, parabasal, metaplastic, and koilocytotic. The performance is compared with multiple machine learning models to validate the robustness of the GCN model’s performance.

  6. vi.

    The effectiveness of the model has been demonstrated using two distinct datasets, highlighting its robustness and adaptability in various scenarios.

Cervical cancer is a major threat to women. The statistical data highlights the urgency of the prevention of this disease. Comparison with prior studies indicates that our proposed approaches could be a reliable method for improving cervical cancer diagnosis.

The organization of the paper is divided into six sections- Section 2 performs a comprehensive literature review and summarizes the objective and the limitations in a table. The dataset description with a complete figure and table, the segmentation process, and the feature analysis are described in Section 3. Section 4 depicts the methodology of this proposed study, which includes graph representation, ablation study, and graph modeling. Section 5 also outlines the result analysis of the implemented techniques and performance comparison with machine learning and existing literature. An extensive discussion of this approach and the limitations of this study are highlighted in Section 6. Next, Section 7 concludes the study by emphasizing the usefulness of this investigation for cervical cancer diagnosis, and finally, in Section 8, the potential future guidelines are discussed.

2 Literature review

The use of computer-aided systems to diagnose cervical cancer has attained a high interest worldwide. Several studies [13,14,15,16,17] adopted machine learning and deep learning-related techniques to create an automated diagnosis system. Machine learning techniques can be applied to successfully enhance the effectiveness of cervical cancer screening, which contributes to healthcare for better performance. In recent years, it has been introduced in different kinds of images, such as histopathological and X-ray, along with clinical data to diagnose cervical cancer comprehensively.

In recent years, several research has been performed on different machine-learning models to classify cervical cancer. Mamun et al. [18] conducted a machine learning-based statistical analysis performing three feature transformation strategies to assess the best classification results. The highest accuracy of 98.45% was achieved by random forest classifier. Lu et al. [12] introduced an ensemble machine-learning classifier for cervical cancer diagnosis. They utilize a voting method in both public and private datasets and effectively enhance the performance of the model.

Benhari et al. [19]. also developed an ensemble technique to detect cervical cancer. They employed a belief network in the classification layer and used dumpster combination rules to reduce uncertainty and enhance the model performance. The proposed classifier obtained an accuracy of 97.2% in the SipakMed dataset.

Deep learning techniques are also emphasized in cervical cancer screening. Erdal et al. [20] interpreted Pap Smear images to detect cervical cancer. They used hash coding with CNN architecture to generate the interpretable mask. Their proposed technique outperformed with a satisfactory outcome. Ritesh et al. [21] presented two ensemble approaches to classify cervical cell images combining CNN architecture with the vision transformer (Vit-CNN) and the long short term memory (LSTM) respectively. The Vit-CNN model obtained the best test accuracy of 97.6% compared to another approach. With the help of deep learning techniques, Orhan et al. [22] implemented a feature extraction model to classify pap-smear images. They extracted features using transfer learning based on DarkNet19 and DarkNet53 and identified the optimum feature set using the feature ranking technique. The highest accuracy of 99.46% was obtained by support vector machine (SVM) classifier. Elima et al. [23] designed an automated system to perform instance segmentation and classification on three cervical cancer image datasets. The proposed UNET architecture was developed to segment the nucleus. This approach achieved a reliable accuracy of 98.8% on binary classification. Rahman et al. [7]. proposed a deep learning-based feature fusion strategy for cervical cell classification. After preprocessing, they trained the images using four transfer learning models to extract and normalize the features and obtained an accuracy of 99.85%, 99.38%, and 99.14% for 2-class, 3-class, and 5-class, respectively. Jun et al. [24] implemented a graph convolution network to classify the cervical cell images. They extracted the features using CNN architecture, determined the clusters of the features, and constructed graphs using the relationship among the clusters. Finally, the graph convolution technique was utilized on these graphs, and an accuracy of 98.37% was obtained. Qin et al. [25] introduced a feature fusion model for automatic cervical cancer cell classification. They developed a multi-task model, incorporating the feature fitting and label smoothing method that can improve the model performance. The model was executed on HUSTC and SipakMed datasets and obtained an accuracy of 81.88% and 91.67%, respectively. Bashak et al. [26]. developed an automated system for cytology classification from three datasets. They extracted the features using several CNN models. An evolutionary optimization algorithm is also used to find the optimal feature set. The SVM classifier reports that the best accuracy of 99.47%, 98.,32%, and 97.87% was obtained for three datasets, respectively. Kundu et al. [8]. utilized an evolutionary algorithm and pre-trained deep-learning models for cervical cancerous cell classification. They extracted features from the cell images using ResNet-18 and GoogleNet models and applied the genetic algorithm to determine the optimal features. Finally, the SVM classifier is used for the final classification. This proposed model was evaluated on two public datasets and obtained satisfactory outcomes. Alyafeai et al. [27]. developed an automated pipeline for detecting and classifying cervical images. The pipeline consists of two deep-learning models for detection and classification tasks. The IOU measurement indicates that the study obtained a detection accuracy of 68%. Moreover, the classification model achieved an accuracy of 82%. The addition of the system to mobile deployment enhances the practicality of the study. Khamparia et al. [28]. introduced a combination of convolutional-autoencoder-based networks to diagnose cervical cancer. They conducted experiments on the Herlev dataset, where the convolutional layer extracts features and the autoencoder part reduces the data dimensionality. The best accuracy was obtained at 99.2% for the experiment.

The primary concerns with most of the research are the inadequate execution of accurate nucleus and cytoplasm segmentation and the missing feature analysis, which can provide valuable insight into the dominance of the features in the model outcomes. They emphasize the model’s architecture, and the outcome relies on only fine-tuning this model. However, this study adhered to image processing, segmentation, and feature analysis simultaneously, which is crucial in the context of histopathological imaging. Besides, limited studies add the graph representation and graph-based approach for the classification of cervical cancer abnormality. This study addresses all these issues substantially. It aims to propose an automated approach for cervical cancer diagnosis efficiently. The segmentation technique, feature analysis, and the proposed GCN model, which effectively contributes to developing a reliable cervical cancer diagnosis system, are highlighted meticulously in the subsequent section.

3 Materials and methods

This study introduced a complete framework to diagnose cervical cancer from histopathological images. The entire methodology is illustrated in Fig. 1. After acquiring the images, the first step is to segment the nucleus and cytoplasm from the images. Twenty-six handcrafted features for individual segmented images (nucleus and cytoplasm) are generated. The use of handcrafted features has proven significant in medical image analysis, providing necessary insights about the pathological properties of the images and substantially contributing to an accurate diagnosis [29]. Two feature ranking techniques have been evaluated to find out the optimal features. Moreover, we construct a graph dataset considering the strongest positive correlation between the features. The Pearson correlation technique was used to determine the features’ positive correlation. Then, a GCN architecture is implemented to validate our approaches. Several performance metrics are used to determine the best result substantially.

Fig. 1
figure 1

Workflow diagram of the proposed study

3.1 Dataset description

The study uses the ‘SipakMed’ cervical cancer dataset [30]. The dataset comprises 4049 cell images divided into five classes: dyskeratotic cells, superficial intermediate cells, parabasal cells, metaplastic cells, and koilocytotic cells [30]. The number of cell images for each class is 813, 825, 793, 787, and 831, respectively. The dimensions of the images are 2048 × 1536, and all the images were in PNG format. Dyskeratotic and superficial intermediate are the sub-classes of normal cells. The benign class includes metaplastic cells, whereas parabasal and koilocytotic cells are considered abnormal. This dataset has an adequate sample size, enabling the model to capture the diverse patterns within the data and contribute to a satisfactory outcome. Besides, the dataset is well-annotated, which establishes its credibility. It also contains multi-nuclei in a single-cell image, which increases the complexity of the segmentation and classification compared to single-nuclei images. Consequently, this study aims to address this complexity by introducing a novel segmentation technique and showcasing the proposed model’s ability in real-world scenarios. Finally, a few GCN-based methods are employed on this dataset for cervical cancer diagnosis, which tends this study to conduct experiments on this dataset.

Figure 2 represents the original images of SipakMed and Herlev dataset. The nuclei of normal classes are usually flat, round, or oval. The abnormal nuclei are found to be larger than the normal nuclei. The Human Papillomavirus (HPV) changes its morphological structures, and the benign class is developing cancerous conditions [31]. The study is considered the Sipakmed dataset as the primary dataset. However, to test the effectiveness of our proposed model and segmentation approach, we utilize the Herlev dataset that validates the overall performance. Herlev dataset [32] is a publicly available dataset containing 917 pap smear images. It includes seven classes, and among them, the superficial squamous epithelia, columnar epithelia, and intermediate squamous epithelia correspond to normal cells, and carcinoma in situ, mild dysplasia, moderate dysplasia, and severe dysplasia belong to abnormal cells. The results of the experiments conducted on this dataset will strengthen the applicability of this study. A table is incorporated below that encompasses valuable information about these two datasets (Table 1).

Fig. 2
figure 2

Sample images of the dataset

Table 1 A brief description of the handcrafted features

3.2 Nucleus and cytoplasm segmentation

The nucleus and cytoplasm are the primary and essential components of histopathology images. They are the critical markers that provide insight into cellular morphological abnormalities, which aid in precise diagnostic methods. The segmentation of these components helps to acquire the texture, shapes, and intensity-based information and provides the distinction of features across the different classes. Besides, the relationship between these cytoplasm and nucleus features can be demonstrated in graph-structured data, leverages the learning of the connectivity patterns, and provides satisfactory diagnostic outcomes. Therefore, this study focuses on segmenting the nucleus and cytoplasm and upgrading the applicability of the study. However, nuclei and cytoplasm segmentation from the histopathological cervical cell images is challenging [33]. This study proposes an efficient segmentation technique that simultaneously segments both the nucleus and cytoplasm.

The proper measurement of different significant cellular properties such as shape, texture, and statistical information for both the nucleus and cytoplasm can give insights related to the development of the different types of cervical cells. The visual representation of the segmentation process is shown in Fig. 3.

Fig. 3
figure 3

Segmentation process of nuclei and cytoplasm

First, the images are converted to grayscale format and de-noised, applying a median filter, which replaces the pixel value with the median pixel value of its neighbors, smoothens the images, and preserves the significant details [34]. The Otsu thresholding algorithm is utilized to separate the nuclei from the images [35]. This algorithm requires min and max intensity values to segment the images. This study’s optimum min and max values are 100 and 150, respectively. A contour detection algorithm is applied to detect the boundary of the nuclei regions. After that, the segmented nuclei area is subtracted from the images to conserve only the cytoplasm area. The inpainting algorithm is employed to fill the missing regions of the images [36]. Then, the image is binarized to obtain only the segmented cytoplasm area. Finally, the watershed technique determines the object’s boundary and creates a line separating it from the irrelevant items [37].

3.3 Feature extraction

After obtaining the nucleus and cytoplasm regions from the cervical images, 26 handcrafted features were extracted from both nucleus and cytoplasm segmented images, respectively. The use of handcrafted features has proven significant in medical image analysis, providing necessary insights about the pathological properties of the images and substantially contributing to an accurate diagnosis [29]. This feature extraction process combines image process techniques that quantify the shapes, texture, and pixel-wise orientation and describe the meaningful patterns of the images. Additionally, these handcrafted features aid in differentiating the abnormal and normal cell structures based on their value distribution. Several built-in libraries and packages are used to extract and analyze these features. A total of 52 features are extracted, where 26 features belong to the nucleus and 26 are from the cytoplasm. These features are based on ROI’s texture, shape, color, intensity, GLCM, and morphology. The description of the features is described in Table 2:

Table 2 A brief description of the handcrafted features

3.4 Feature ranking technique

Feature rank analysis defines the evaluation of features according to their significance on a model’s prediction [38]. To determine the significant features, the feature ranking technique is performed. As the classification of this study is conducted through the graph-based method, feature selection is essential in generating an optimal graph. Two feature ranking techniques have been adopted, including random forest feature importance and information gain, and the features are separated into three distinct sets for further analysis. The random forest feature importance algorithm is widely used for its consistent performance. It determines the impurity reduction following each feature split. Each feature is given a feature rank score based on this calculation. The higher rank value denotes the significant features. Information Gain is a decision tree-based feature selection technique that reduces the features’ uncertainty. It calculates the entropy of each feature and compares it with the average weighted entropy. The features with the highest gain score are prioritized as the optimum features. We merged the nucleus and cytoplasm features into a single dataset containing many features for both the cytoplasm and nucleus.

Random forest feature importance and information gain are widely evaluated to estimate the feature importance in clinical data [39, 40]. In particular, these techniques are highly effective and computationally efficient for high-dimensional data [41]. Because they build trees for individual feature sets and measure their score only and not be affected by the outliers. In addition, this technique can handle the non-linear relationship between the feature and target variable. It also effectively handles the noisy data, aggregating the tree score and accurately measuring the feature’s importance [42]. On the other hand, other approaches have significant disadvantages in handling high-dimensional and complex data. Statistical based approach often relies on their individual statistical test, and often, the high dimensionality or outliers’ data affect the test and result in biased feature importance rankings. Hybrid methods are computationally inefficient, and the parameter sensitivity is also quite challenging. Consequently, these two approaches are highly effective in resolving these challenges. Therefore, these techniques are adequately suitable for this study compared to other methods. This experiment achieves the overall best result after using the random forest importance technique. For this reason, it is chosen as the optimal feature selection technique.

One of the novelties of our study is to figure out the significant features that contribute to the classification. According to the feature ranking analysis, we have discovered some information, such as cytoplasmic features, which are considerably more significant than nucleus-based ones. This cytoplasmic portion is essential because it serves as the cervical cell’s ROI. The cytoplasm differs in shape from cell to cell; for example, the normal class cytoplasm has a round, oval, or polygonal shape [43]. The cytoplasm of the abnormal class typically has a three-dimensional cluster vesicular shape [44]. Consequently, the cytoplasm’s handcrafted features differ from class to class. The nucleus, on the other hand, is essentially in the center of that cell mostly. Since the nucleus area is relatively small, features including minor axis length, mean, brightness, area, solidity, and EquivDiameter did not vary much among the classes. In addition, four distinct feature sets have been developed for an accurate evaluation. The first set has a maximum of 20 features that are mostly cytoplasm-based. These are also the top 20 significant aspects of our study. Similar to this, three further sets are made, including 30, 40, and 50 features based on the significance of the feature. We have also discovered that the entropy and energy-based features are comparatively more significant. From this analysis, brief ideas can be proposed to focus on which features during the classification.

4 Methodology

This section demonstrates the systematic approach of this study which encompasses the graph representation from the features, a detailed ablation table for the model optimization and the architectural description of the proposed model.

4.1 Graph representation

Making a compatible graph dataset from its tabular information is challenging. In this study, we generate a high-correlation-based graph for reliable performance. The graph is an effective data structure representing multiple objects’ relationships [45]. A graph is a set of nodes or vertices connected by the edges. First, the CSV file is read, containing the significant features and the respective class label. Each row represents an individual node. The node is a unique row index ID; the extracted features are the node features. Three dictionaries have been created, namely source_dict, target_dict, and node_dict, for mapping with the proper feature values. The source_dict dictionary maps the corresponding index of the feature columns. One node will be enlisted into source_dict, while the remaining nodes will be stored in the target_dict. It compares the relationship between the nodes using the Pear-son correlation coefficient value with all nodes of target_dict. This process will be continued until it iterates all the nodes. This technique quantifies the relationship between the images as the image number represents the total index number. Since this study has considered only the positive strongly correlated features, the threshold value is fixed at 0.5. The node will be connected if the two features have a correlation value of more than this threshold. The associated correlation value will be used to determine the edge weights. Node_dict maps the extracted unique feature values. This process results in the desired graph dataset for further validation and analysis. Figure 4 presents a conceptual representation of graph construction utilizing the selected features.

Fig. 4
figure 4

An abstract view of graph construction using the features

The graph is an undirected graph. Since the CSV file contains a total of 6580 index and each index is a node, therefore, the graph consists of a total of 6580 nodes. Furthermore, the 19,160 pairs of nodes meet the threshold requirements, resulting in 289,160 edges. The proposed GCN model further utilizes this graph.

A positive advantage of this representation is that it effectively addresses minor data imbalance issues. The generated graph has a substantial number of nodes and edges. Since it only considers the strongest relationship between the nodes and edges, therefore, the graph dataset eliminated the irrelevant or outlier data, representing the explicit relationship between nodes and edges. Consequently, the graph-structured dataset addresses the data heterogeneity at different levels [46]. Moreover, it allows the model to capture more complex relationships between the entities that were not apparent in the tabular form. It also enhances the global connectivity between the entities. In summary, the generated graph-structured dataset has a flexible and interconnected representation that effectively addresses the data imbalance and heterogeneity.

4.2 Ablation study

This section discusses tuning the optimal parameters for the proposed model. Selecting the optimal parameters for the model is a critical task. These parameters influence the model’s performance. Fine-tuning of these parameters can significantly boost performance. Table 3 provides an empirical experimentation for the model optimization.

Table 3 Investigation of parameters for the model optimization

Six parameters are fine-tuned to determine the best parameters set for the model building. The initial case study was the configurations of hidden units. The number of hidden units regulates the capturing of patterns from the samples. The small units often lead to underfitting, and the high units require more computational cost. We initially conducted the experiments with small units; however, the optimal unit size is tuned at [32] as it achieves a remarkable accuracy of 95.60% while effectively addressing the underfitting problem. This study experimented on high-dimensional graph-structured data containing extensive numbers of nodes and edges. Small batch size requires more iterations, which may cause instability during the training process. Larger batch sizes can effectively handle the extensive feature sets and estimate the accurate gradient in the training process. Therefore, the best accuracy of 97.17% is achieved using the larger batch size. The subsequent experiment determined the dropout value. It is observed that the utilization of a dropout value of 0.2 decreased the accuracy to 94.03%. Besides, a high dropout value is more suitable for larger or more complex datasets. So, the optimal dropout value of 0.5 is kept for further experiments. This study investigated two learning rates, and the employment of 0.001 significantly increased accuracy, attaining 99.11%. The examination was conducted for 100 epochs using the Adam optimizer, which yielded the optimal configuration of these parameters. The alteration of these parameters drops the accuracy and makes the model computationally more inefficient. After conducting these case studies, the optimal configuration is determined, and the performance of the model is optimized, which results in a satisfactory performance requiring less time complexity.

4.3 Proposed model

GCNs have successfully attracted interest in recent years in analyzing graphs and aiming to map the nodes in a specific category [47]. In this study, we introduced GCN techniques to perform the node classification on the graph representation of the proposed tabular dataset. A single graph is constructed, and the weighted data is prepared for training. We also enumerate the feature’s name and its corresponding source and target labels. Figure 5 demonstrates the model building of the proposed study. Moreover, a feed-forward neural network (FFN) block is built to generate the message from the nodes and sum the node vectors to accomplish the node classification.

Fig. 5
figure 5

Proposed model to perform the node classification of cervical cancer

The FFN block of the proposed model comprises a single input layer, five FFN blocks, five skip connections between each of them, and finally, a dense layer. We also maintain a batch normalization layer, activation function, and dropout value to improve the generalization of the model. The input layer is forwarded to the 1st and 2nd FFN blocks. A skip connection reduces the parameter’s size and operation cost. Then, we have three more FFN blocks with their respective skip connection. The output of the last skip connection is forwarded to a dense layer with five vectors since our dataset has five classes.

The FFN block preprocesses the node features to generate the initial node representation. The convolution operation is performed across the graph to extract features denoted by the message-passing technique [48]. This approach is employed, and it operates on a node and the connected neighbor node. This approach is employed, and it operates on a node and the corresponding neighbor node. The main objective of the message-passing technique is to iteratively generate an optimal node embedding that preserves neighborhood information [49]. The message-passing technique is completed with two steps, including Aggregation and Update.

The graph convolution operation for the proposed model can be described using the following mathematical expressions [50]:

$${\varvec{n}}_{i}^{\left(k\right)}={\varvec{h}}_{i}^{\left(k\right)}{\varvec{W}}^{\left(k\right)}$$
(1)
$${\varvec{h}}_{i}^{(k+1)}=\sigma \left(\sum\limits_{j\in {\mathcal{N}}_{i}} {e}_{ij}{\varvec{n}}_{j}^{\left(k\right)}\right)$$
(2)

In Eqs. 1 and 2, \({\varvec{h}}_{i}^{\left(k\right)}\) denotes the embedding of the \({i}^{\text{th }}\) node resulting from the \({k}^{\text{th }}\) graph convolutional layer, \({\varvec{W}}^{\left(k\right)}\) is the weight matrix, \({\varvec{n}}_{i}^{\left(k\right)}\) is the resultant multiplication between the node embeddings and weight matrix, \({e}_{ij}\) indicates the importance of node \(j\) ‘s features to node \(i\), and the activation function is referred by \(\sigma\). The weighted summation over the features of its neighbors with the activation function helps to update the node embeddings. Neighborhood aggregation term combines the features from the nodes and explicitly demonstrates the node representation. The aggregation function can be different. In our study, the mean strategy is used to calculate the weighted sums of the neighbor nodes and divide them by the number of neighbor nodes. The update function substantially updates the new weights of the corresponding nodes. We implemented two GCN layers in our model. The FFN networks block processes the output of the second GCN layer, precisely illustrated in the following figure. This block successfully converts the weights into five vector values, and the softmax activation function is deployed to identify the accurate cervical cancer class label.

5 Result analysis

This section comprehensively depicts the features list obtained from the best feature ranking technique and analyzes the result of the proposed GCN model. It includes the loss and accuracy curve to explore the generalization of the model. A confusion matrix is added to reflect accurate classification accuracy, and insightful interpretations are incorporated to understand the decision-making and reliability of the model. Additionally, a comparison with several machine learning models is performed to highlight the strengths of the model.

In this paper, cervical cell images are classified into five classes. We implemented a GCN model to validate our approach. Since this study is a prioritized feature-based performance analysis, all the models are trained separately on the individual feature set. Figure 6 highlights all the features and the corresponding importance score. To formulate a 40-feature set, we consider the first 40 features that achieve the highest importance score and experiment. The accuracy curve and confusion matrix generated from the experiment are analyzed comprehensively.

Fig. 6
figure 6

Feature importance score and their corresponding name

5.1 GCN result analysis

This section analyzes the extensive experiments of GCN to perform cervical cancer cell classification. Since we consider only the strongly connected features to construct the graph, it results in a sublime performance efficiently. We fine-tuned to figure out the optimal configuration for our GCN model. The identified configuration defines its robustness by achieving superior accuracy. The regularization of the model is effectively improved by the 256-batch size and dropout value of 0.2. The model initially ran for only 50 epochs, with poor results. The model reaches its peak and provides the best level of accuracy for the 40 and 50 feature sets with the addition of the epoch number to 100.

Table 4 highlights that the proposed GCN model yields a consistent performance for all the corresponding feature sets. The results slightly dropped to 30 feature sets since some statistical and morphological features are not excluded in this set. The rest of the two feature sets provide identical performances where the 40-feature set obtains the test accuracy of 99.11% and the 50-feature set achieves 99.08%, respectively. The model’s inclusion of all relevant textures, morphology, shape descriptors, and statistical features in a 40 and 50 feature set is one of the primary causes of this. We formulate all these feature sets after applying the random forest feature technique. The higher feature score is selected as the more significant feature. In this study, we preferred the 40-feature set as the optimal dataset for this study since the 50 feature sets require more computational time and cost. Besides, the result is decreased in some other performance metrics compared to the 40 features set.

Table 4 GCN Result for all three feature sets of the SipakMed dataset

The loss and accuracy curve are an excellent visualization tool for assessing the performance of the model. The loss and accuracy curve for our GCN model experimented on the SipakMed dataset is displayed in Fig. 7. The accuracy curve provides insights into the model’s accurate prediction, while the loss curve shows the gradually declining misclassifications. Besides, the loss and accuracy curve show the effective generalization of the proposed model, as it did not exhibit any underfitting or overfitting tendencies throughout the training and validation sessions.

Fig. 7
figure 7

Loss and accuracy curve of the proposed GCN model

Figure 8 presents the confusion matrix and K-fold cross-validation to estimate the model reliability and capability for unseen data. The confusion matrix highlights the accuracy and misclassifications of the proposed model. It is displayed that the model is not biased to any classes. Instead, it demonstrates its efficacy by reducing the misclassification across all the classes. The low misclassification rate signifies the potentiality of the proposed approach as it successfully mitigates the false prediction. K-fold validation is a valuable technique for assessing the model’s prediction ability. We have utilized four foldings, 3, 5,7, and 10, to evaluate the model. The GCN model demonstrates satisfactory performance across all folds. The model obtains the best accuracy of 99.17% for the 7-folds. Constant and higher accuracy for GCN indicates the generalization capabilities for that model.

Fig. 8
figure 8

Confusion matrix and K-fold cross-validation of the GCN model

5.1.1 Validation of the proposed technique

After attaining a superior performance on the Sipakmed dataset, we intend to conduct another experiment on other datasets using the same approaches. This investigation is performed to highlight the efficacy of our approaches. Besides, it ensures the reliability of our approaches, which address the concern of false predictions and mitigate the associated risks on patient outcomes. We follow the same approaches and configuration on the Herlev dataset for validation. The consistency in our methodology, including the nucleus and cytoplasm segmentation, feature extraction, graph representation, and model deployment, remains the same for both datasets, which also established the fair evaluation. Table 5 includes all the performance metrics results obtained in the Herlev dataset.

Table 5 Performance analysis of the proposed GCN model in the Herlev dataset

The performance assessment signifies the ability of our proposed model as the outcomes are close to our primary dataset. It also highlights the consistency of the proposed techniques. The performance metrics such as accuracy, specificity, precision, and recall attained a remarkable value of 98.18%, 97.83%, 97.81%, and 97.91%, respectively. The false prediction rates are also minimized, which indicates the positive predictions of the model. The performance alignment between our main dataset and this validation dataset enhances the reliability of this study across a wide range of samples.

5.2 Interpretability of the proposed model

This section investigates the interpretability of the proposed model by utilizing two plotting techniques. It gives insight into the contribution of the features that are impacted fully in the model’s prediction and visualizes the node embeddings.

t-SNE (t-Distributed Stochastic Neighbor Embedding) and Layer-wise relevance propagation (LRP) are effective plotting techniques to interpret the decision-making of the proposed model. This study employs a t-SNE plot to understand the node embeddings by the model. Additionally, layer-wise relevance propagation for the dense layer samples is investigated to explore the significance of the feature in the model predictions. Node embedding denotes the learning patterns of the features of nodes. t-SNE is a robust and complex method for capturing the relationships between the features from a high-dimensional dataset. The figure demonstrates that distinct nodes with similar attributes are clustered closely. Each of the five classes is depicted by a different color point in the plot. The yellow points are located near [-60, -20], the black points are comparatively higher between [-20,20], and other class points are relatively clustered in distinct ranges. It reflects the ability of the model to capture the meaningful relationship between the features of the nodes. It also highlights that the model preserves the local similarities between node features and differentiates among the classes. Besides, there is no massive outlier of the points, and all the points are closely located, indicating that the model has reduced the data dimension. The LRP scoring plot displayed the features’ contribution to the model’s outcomes. We investigated the LRP technique in the dense layer, which can provide more accurate insights into the dominance of the features. The feature index shows that some features are more important than others; hence, they have higher relevance scores. The higher LRP scores represent that the feature is more influential in determining the model’s output. For instance, the feature index of 8,11,12, 23, and 36 illustrate the Nuclei_correlation, gabor_Cyto_Entropy, Cyto_homogeneity, Cyto_Solidity, and Cyto_area are the most influential features for the mode’s prediction. Along with this, a few other features also have a crucial impact. The value distribution of these impactful features helps to determine the class label. Thus, these visual interpretations provide insights to the medical experts to understand the model’s decision-making process. A new patient can easily be diagnosed with these approaches. After conducting all the experiments, the model reports a class label to the experts, and they can validate this outcome by examining the clusters in which it is included and determining which class feature index pattern matches the new sample. Thus, the risk of false diagnosis is substantially reduced. The combination of the t-SNE and LRP plots effectively interprets the model’s decision-making by showing the holistic relationship between the nodes and their clusters and revealing the feature’s importance in the model’s prediction.

5.3 Comparison with machine learning model

A comparative evaluation is conducted using several machine learning models to demonstrate the improved performance and dependability of the suggested approach. Figure 10 displays the result analysis of ML models with the GCN model.

Several performance metrics tools were employed, such as accuracy, precision, recall, and F1 score. According to the findings presented in Fig. 9, it can be observed that the suggested Graph Convolutional Network (GCN) model has superior performance compared to all other Machine Learning (ML) models. It attained the maximum accuracy value of 99.11%, whereas the closest ML model accuracy is 89.91%, obtained by random forest. Compared to GCN models, the efficacy of other models fell short across all performance metrics. Gradient boosting and decision tree achieved an accuracy of 82.32% and 79.02%, respectively. KNN and SVM failed to assess the best performance and provide a drastic outcome (Fig. 10).

Fig. 9
figure 9

t-SNE and Layer Relevance Propagation plot of the proposed GCN model

Fig. 10
figure 10

Comparison of result analysis of ML models with the GCN model

5.4 Comparison with existing literature

In contrast to the relevant prior studies, Table 6 shows a comprehensive analysis highlighting our proposed strategy’s performance advantages. It provides a valuable benchmark analysis, contextualizing our findings and emphasizing the relevance of this study in cervical cancer diagnosis.

Table 6 Performance Comparison of the proposed model with existing literature

Sabina et al. experimented with a hybrid model on herlev dataset and obtained an accuracy of 99.7%. Jun et al. introduced a graph convolution network to classify the cervical cell images of the Sipakmed dataset and obtain an accuracy of 98.37%. Our proposed model outperforms the study by maximizing accuracy. Orhan et al. deployed Darknet with SVM to the SipakMed and Mendeley datasets and got accuracy values of 98.26% and 99.46%, respectively. Ritesh et al. employed a Transformer model combined with CNN on the SipakMed dataset to reach an accuracy of 97.6%. On the Custom Pap Smear Dataset, Wang et al. used a Chain Genetic Algorithm with SVM and got an accuracy of 96.03%. Mehmood et al. utilized Random Forest on a UCI tabular dataset and achieved an accuracy of 93.6%. Allehaibi et al. developed a Deep CNN model for their custom dataset and yielded an accuracy of 98.1%. The comparison assessment with existing literature demonstrates the superiority of our proposed approaches. The evaluation of GCN architecture surpasses all the studies in terms of accuracy, giving insights into our techniques’ potential practical implementation.

6 Discussion

Several studies attempted different techniques to develop an effective cervical cancer diagnosis system. Some focus on detecting the cytoplasm and nucleus area from cells, while others adopt deep learning models for high classification accuracy. In this study, we proposed a computer-aided system with significant explanations for classifying cervical cancer cell images. A novel method was investigated for isolating the nucleus and cytoplasm from the images, identifying the key features that contain the nucleus and cytoplasm properties separately, training machine learning models, and finally implementing a GCN architecture to confirm the stability and increasing the acceptability rate of this method. This experiment is conducted on a histopathological cervical image dataset comprising five classes. We integrate various techniques and present an efficient segmentation algorithm to separate the nuclei from the surrounding tissue while leaving the cytoplasm area unaffected. This algorithm can precisely segment the nuclei and cytoplasm simultaneously and define the object area by creating a line. 26 handcrafted features are generated from the nucleus and cytoplasm individually. To determine the importance of the features, we adopt two feature ranking techniques; between them, random forest feature importance is chosen for this study. Feature analysis indicates that the cytoplasm properties, such as shape, morphology, and contents, are more significant compared to the nucleus.

We create four feature sets to determine which set has the most critical features from this investigation. On each feature set, four tree-based machine-learning models were experimented with. The results show that all the models perform well in 30 and 40 features, respectively. RF model outperforms all other models in terms of accuracy. It obtains the best accuracy of 89.91% for the 40-feature set. The higher value of different performance metrics supports our methodology. Another novelty of this study is introducing the graph mechanism to classify the cervical cell. The proposed GCN architecture successfully attains 99.11% node classification accuracy. The key reason for this superior performance of our model primarily underlies two things- (i) node representation and (ii) node propagation using the message-passing technique. We address the challenging task of transforming the tabular dataset to its corresponding dataset by considering the potential relationship among the critical features. The Pearson correlation technique is utilized to find the most substantial relationship between the nodes, and those are followed by connecting an edge. The message-passing technique calculates and updates the weights of the nodes adequately. In addition, the proper configuration exhibits the efficacy of the proposed model. To assess the effectiveness of the proposed approach, we tested its performance on the Herlev dataset, and it provided a satisfactory outcome. To interpret the decision-making of the proposed model, we utilize t-SNE and LRP plotting, which provide valuable insights about the node embedding patterns and feature contribution in the model’s outcome. This study possesses significant potential in real-time implementation that facilitates improving the treatment in the healthcare domain. Integrating segmentation, feature analysis, and graph-based approaches serves as a clinical workflow for cervical cancer treatment. The utilization of segmentation and feature analysis leverages the automated process to identify the cytoplasm and nucleus shapes, area, and density and uncover the cellular properties with a distinct pattern to understand the abnormality of the cell. The inherent relationships can be represented using graphs, and the optimized GCN model helps to process these patterns and predict the accurate condition. These strategies in real-time systems increase the visual interpretation of the diagnosis tool. Moreover, it optimizes healthcare resources and offers healthcare practitioners a rapid, automated diagnostic tool. However, the dependability of this proposed method can be improved by incorporating real-time data. Experimenting with real-time data helps the model learn the dynamic distribution of the samples. Besides, the tool will not be primarily used for making decisions by itself; rather, it will help the radiologists to come to a decision. After getting some insight from our proposed method, it will be validated by doctors and then passed on to the patients. This is how this study can mitigate potential risks. In summary, this study’s findings can substantially contribute to developing a reliable method for the cervical cancer diagnosis system that can provide better patient outcomes.

7 Conclusions

Cervical cancer appears to be a significant health concern for women and causes many deaths worldwide. The alarming rise in deaths emphasizes the need for effective prevention and diagnosis. This study introduces a computer-aided approach for automated cervical cancer diagnosis by analyzing histological images. This study’s key contributions include simultaneous nucleus and cytoplasm segmentation, comprehensive feature analysis, and accurate classification using a GCN model. The nuclei are correctly separated from the surrounding tissue while maintaining the integrity of the cytoplasm due to the combination of several approaches and an effective segmentation algorithm. The result of the feature analysis highlighted the significance of cytoplasmic characteristics in identifying cervical cancer by showing that they were more important for classification than nucleus properties. In addition, another novelty of our study is the transformation of the tabular dataset into a graph representation, which enables efficient relationship analysis among important features. The Pearson correlation technique was essential for identifying the strongest relationships. The introduction of the graph mechanism utilizing the GCN architecture resulted in a significant achievement of 99.11% accuracy in node classification. The success of our model can be attributed to effective node representation and propagation using the message-passing technique. To investigate the stability of the proposed method, rigorous experiments are conducted. Furthermore, the model was tested on another dataset to assess its efficacy, and it obtained an accuracy of 98.18% in the validation dataset. Our approach demonstrated promising results, providing a reliable method for improving cervical cancer diagnosis. Consequently, the proposed method has significantly contributed to developing an essential tool in prevailing against cervical cancer, ultimately leading to preserving lives and improving healthcare standards for women globally.

8 Future scope

This work has employed segmentation, feature extraction, and node classification techniques to diagnose cervical cancer from histopathological images. However, the potentiality of this study can be increased by including other advanced techniques that also improve the practicality of this diagnostic process. Several prospects can be implemented to highlight the existing findings effectively. Along with segmentation and classification, detection techniques can be added. Nucleus and cytoplasm detection will provide more valuable insights into the structural distribution of the cell and detect the abnormality precisely. The experiments can be conducted on multi-modal datasets or other medical imaging to broadly assess the effectiveness of the proposed model. Real-world deployments can leverage the use of AI in the healthcare domain. By encompassing all these future scopes, the study contributes to a more precise and advanced diagnostic method for cervical cancer and substantially improves patient outcomes.