1 Introduction

Cancer is the second leading cause of death in the United States today, with lung cancer ranking third. It is one of the most common types of cancer and one of the leading causes of death. Cancer cells are produced as a result of the body's abnormal cell growth [1,2,3]. These abnormal cells are referred to as "tumours". These cells are divided into two types: benign and malignant [4, 5]. Benign tumours, on the other hand, rarely pose a life-threatening threat, whereas malignant tumours can. In contrast to malignant tumours, which have irregular shapes and spread to other body cells to form new cancerous nodules, benign tumours have smooth and regular shapes [6,7,8]. Malignant tumours are referred to as "cancer" in this context. This study describes a Convolutional Neural Network (CNN)-based technique for determining whether lung tumours are malignant or benign [9].

CT (computed tomography) imaging is the best imaging technique for diagnosing lung cancer because it can identify all known and unknown nodules [10]. Early detection of lung cancer through CT screening can help make the disease more treatable. In general, it has been stated that if a cancer case is identified early, diagnosed, and treated properly, the patient's chances of living a long-life increase [11].

Deep learning is one of the domains for lung cancer classification and recognition. Deep learning not only speeds up the critical task, but it also improves the accuracy and performance of CT image detection and classification [12]. Deep Learning has gained prominence owing to its exceptional precision whenever training with immense quantities of data. It is a critical component of data engineering, and encompasses statistics and modeling for prediction. It is a part of artificial intelligence with immense potential and versatility. It is a problem-solving strategy that tackles an issue from beginning to end [13]. It is especially beneficial for data professionals who are responsible with acquiring, analysing, and deciphering enormous amounts of data since it speeds up and simplifies the process. The procedure for learning in classical machine learning is supervised, and the developer must be extremely explicit when guiding the computer about the kinds of objects it ought to examine and to figure out whether an image comprises an object or not. The benefit of deep learning is that the algorithm creates the feature set without supervision and classifies the input more accurately.

One of the most well-known deep neural networks is CNN. The input layer, hidden layers, and output layer are the components of a network. Lung cancer affects both men and women equally. Uncontrolled tumour cell growth is extremely dangerous and should be treated as soon as possible. People's lives can be saved if they receive early diagnosis and treatment [14].

Deep learning is used to detect lung cancer in this case. Image enhancement methods improve image quality by enhancing the provided image [15]. The K-means method divides the image into several parts using a partitioning technique. Image segmentation makes it easier to locate areas of interest and identify objects in images [16]. The other part of the paper is organized as follows:

Section 2 contains a related literature review with appropriate justifications and conclusions. The proposed methodology is presented in Section 3. Section 4 displays the results of the proposed method.

2 Literature review

Several researchers have proposed and implemented various approaches for detecting lung cancer using machine learning and image processing techniques.

Monica Ramakrishnan et al. [17] developed a technique to detect the Lung Cancer Nodule using CT images in 2022, offering a new method of implementing CNN using the pre trained VGG model for feature extraction and RNN for feature classification to identify pulmonary nodules in lung cancer detection. Machine learning, data mining, and image processing methods are used in this study to predict lung cancer nodules in high-risk patients. Using a publicly available data set of lung CT images, a model for lung cancer nodule detection was developed based on the research and analysis conducted for this work. By combining image processing and classification techniques, an end-to-end method for detecting lung cancer nodules with 70% accuracy was developed in this study. Because of the large amount of data included in each patient's CT scan, processing the images takes a long time.

Radhanadh Patra et al. [18] proposed a method for predicting lung cancer using KNN, ML, and RBF in 2021. Several machine learning classifier algorithms were used to categorise publicly available lung cancer data in the UCI machine learning repository as benign or malignant. The Weka tool prepossesses the input data, converts it to binary form, and then uses a variety of well-known classifier approaches to classify the data set as cancerous or not cancerous. The outcome is the proposed RBF classifier, which has an accuracy of 81.25% and is regarded as the most practical classifier approach for data on lung cancer prediction. The disadvantage is that it provides less accuracy.

Md. Rashidul Hasan et al. [19] proposed lung cancer detection and classification using image processing and statistical learning in 2019. In this case, a genetic technique was used to identify specific features, and the data was extracted using GLCM.Support vector machines (SVM) were used to classify the stages of lung cancer. An algorithm was created to precisely determine whether the lungs are malignant or not. The proposed method is expected to outperform current technologies, which will aid radiologists in the precise and early detection of cancer. The technique was tested on 198 slices of CT images from the Kaggle dataset of cancer patients at various stages, with satisfactory results. The proposed method's accuracy in this dataset is 72.2%. The disadvantage of using this method is that it provides significantly lower accuracy.

Dhanush Raj et al. [20] published a technique for detecting lung cancer using various image processing methods in 2019. The most commonly used method is the processing of CT images. This method covers the three stages of image pre-processing, image segmentation, and image classification. The problems with lung cancer detection have been discussed in this system. The input datasets in this case are lung CT images. The first step is to enhance the image using the extracted CT images. In the following phase, the improved image is segmented. The Watershed method is used for image segmentation and it is extremely vulnerable to local minima. As a result of this technology, the patient can receive an late diagnosis of lung cancer.

Suren Makaju et al. [21] proposed a method for detecting lung cancer on CT scan images in 2018. Many computer-aided methods based on machine learning and image processing have been researched and tested. To identify the cancerous nodule from the lung CT scan image, the proposed system employs watershed segmentation for detection and SVM for classifying the nodule as malignant or benign. The proposed model detects cancer with a 92% accuracy rate. The disadvantage of using this method is that it is computationally very expensive due to the high cost of watershed segmentation.

Aggarwal et al. [22] proposed a model in 2015 that differentiates between nodules and the structure of normal lung anatomy. Grey levels, statistics, and geometry are used to extract properties. The best thresholding and segmentation classifier is LDA. The system has 84% accuracy and 53.33% specificity. This methodology does detect the cancer nodule with an high accuracy, but itis still inadequate.

In 2015, Roy.T.et al. [23] developed a method for detecting lung cancer nodules that employs an active contour model and a fuzzy interference approach. To improve visual contrast, this method employs a grey transformation. Before segmentation, an image is binarized, and the resulting image is segmented using an active contour model. The fuzzy inference method is used to classify cancer. To train the classifier, features such as area, entropy, mean, correlation, main axis length, and minor axis length are extracted. The system's overall accuracy is 94.12%. This method has the disadvantage of being unable to distinguish between benign and malignant tumours.

Dasu Vaman Ravi Prasad et al. [24] proposed a method for detecting lung cancer using segmentation, gabor filters, augmentation, and pre-processing in 2013. The primary components of this study are image quality and accuracy. Image quality was evaluated and improved using low-precision pre-processing techniques based on the Gabor filter and Gaussian rules. The proposed method effectively employs segmentation concepts as a foundation for obtaining feature extraction. In comparison to the existing method, the proposed methodology yields very encouraging results. On the basis of generic characteristics, a normality analysis is performed. The major features that can be recognised for accurate image comparison are pixel percentage and mask-labelling with high accuracy and reliable operation. The disadvantage of using this method is that it is only appropriate for precise (high quality) images. All of the methods discussed above does not employ a qualitative frame work which is combination of optimization, patch processing and deep learning techniques. As a result, their accuracy dropped to 95%. So, deep learning and patch processing approaches for detecting lung cancer on CT images are proposed in this work for attaining an excellent accuracy.

3 Proposed methodology

Deep learning and patch processing are used in the proposed method to detect and classify CT scan images as benign or malignant. Figure 1 depicts the block diagram of the proposed system.

Fig. 1
figure 1

Proposed block diagram

3.1 Input CT lung image

Initially, CT images are retrieved from public database which include the Lung Image Database Consortium (LIDC).The outcomes are validated on LIDC datasets. The noise in lung CT images is lower than in MRI and PET images. As a result, CT scans are utilised to diagnose cancer. The CT image bears the additional advantage of being incredibly obvious, with low noise and distortion [25].

3.2 Image enhancement

The technique of improving the quality of a digitally recorded image is known as image enhancement. Image enhancement is defined as a method of improving the quality of an image so that the produced image is superior to the original. Changing the brightness or contrast of an image, for example, is a simple process [26]. Many images, including real-life photographs as well as satellite, aerial, and medical images, have low contrast and noise [27]. Each image is enhanced using a unique technique (median filter and patch processing). This includes smoothing out the image and reducing noise.

3.2.1 Median Filter

First, grayscale CT scan images are processed using median filters as input. To help prevent false identification of nodules, the CT scans are given some noise during the image acquisition procedure. Noise can occasionally be confused with malignant tumours. These noises must be reduced in order to detect cancer accurately. To remove the salt and pepper noise from CT images, a median filter is used [28].

3.2.2 Patch processing

When performing patch processing, which involves dividing the image into small patches and processing each patch separately, there are numerous reasons to focus on small patches rather than the entire image. Patching is an appropriate technique for removing undesirable flaws from an image because it allows you to replace selected shaped portions with a surface suited to other arbitrarily shaped regions and a synthetic noise component. The patches (i.e., blocks) are separated from the noisy input image. The blocks are then handled separately to obtain an estimate of the actual pixel values [29]. A patch is a small (often rectangular) fragment of an image. For example, an 8 × 8 patch is a square patch that contains 64 pixels from a larger image (say, 256 × 256 pixels). Because of their smaller size, some image processing algorithms, such as de-noising and super-resolution, are easier to use on patches rather than the entire image. These algorithms divide an image into many smaller patches (for example, 8 × 8), perform separate operations on each patch, and then tile each patch at its designated location.

3.3 Segmentation

The image is divided into different segments, as suggested by the word "segmentation." In this case, portions are separated based on the characteristics of the pixels in order to identify the tumour. As a result, to retrieve the tumour component, the pre-processed image is converted into a binary image [30]. Image segmentation simplifies and improves the communication of such images. Borders and objects are assigned for image segmentation. This method is used to count the pixels in a picture [31]. This approach uses both clustering and particle swarm optimisation to improve segmentation. A method for grouping data points into clusters made up of related data points. The items with potential similarity remain in a group with little to no similarity to another group. K Means clustering and Hierarchical clustering are two popular clustering techniques [32]. K-means clustering was used in this method because it is simple and warms up the centroid positions.

3.3.1 K-means clustering

K-means clustering method could be used to effectively solve the clustering issues [33]. In this image segmentation, this unsupervised learning method is used to separate the Region-of-Interest (RoI) from the background. The method generates a 'k' number of clusters from the input pixel set of an image of size X*Y, where x and y are the Row and Column, respectively. The clustering input pixels are now \(\mathrm{n}(\mathrm{x},\mathrm{ y})\), with o serving as the cluster centre. The cluster with the shortest distance was determined by using Eq. 1.

$$\mathrm{d }= ||\mathrm{ n}(\mathrm{x},\mathrm{ y}) - {\mathrm{o}}_{i} ||$$
(1)

where \(\mathrm{d}\) is the distance between each pixel in an image and center of a cluster "\({\mathrm{o}}_{\mathrm{i}}\)”. Additionally, based on the distance it gives each pixel a "\({\mathrm{o}}_{i}\)" for the centre. The cluster center is then calculated once again using the Eq. 2 until the stopping criteria are satisfied.

$$\mathrm{f}(\mathrm{j})=\mathrm{min}\left({\sum }_{j=1}^{c}{\sum }_{i=1}^{c}\Vert \mathrm{n}\left(\mathrm{x},\mathrm{y}\right)-{\mathrm{o}}_{\mathrm{i}}\Vert \right)$$
(2)

where \(\mathrm{f}(\mathrm{j})\) is the fitness value, the amount of clusters ranges from 1 to "\(c\)" and the number of cases ranges from 1 to "\(c\)". The quantity of datasets is indicated by the letter "\(c\)." This function is dependent on a different distance function computed using the jth cluster and ith case.Each cluster's centroid before and after clustering is as shown in Fig. 2.

Fig. 2
figure 2

K-means clustering

The algorithm's steps are as follows:

figure a

By determining the image matrix from the input CT image (background), the k-means technique divides all pixels into two categories, such as ROI 1 (foreground) and ROI 2. The first step is to determine the foreground and background of the CT image using the Euclidean distance between the centroid C1 and C2, where C1 and C2 represent the respective means of ROI 1 and ROI 2. The foreground and background of CT image data are two categories based on the C1 and C2 of each pixel. If the pixel value differs from the background, it is almost certainly the foreground, because the C1 recognises the foreground's ground truth based on their distance.

The disadvantage of k-means is that it only distinguishes between foreground and background values in an image. Because it is based on accurate pixel values, the foreground and background may have interchanged data, and some portions' pixel values may be the same, the data may not be accurate for either the foreground or the background. To address this concern, the proposed work model investigates a novel PSO behaviour.

3.3.2 Segmentation with k-means and PSO

Particle Swarm optimization (PSO) was created by Eberhart and Kennedy as an evolutionary image segmentation technique [34]. Because it must make decisions based on particle behaviour and architecture, it includes deep learning architecture. Using a fitness solution, the algorithm can move around the search area and track the coordinates of objects. PSO is a swarm-based metaheuristic method that is used in conjunction with unsupervised clustering to improve image segmentation accuracy.

The ith particle's PSO velocity and position are initialized by using the Eq. 3.

$${x}_{i}={y}_{i1},{y}_{i2},...{y}_{iD}$$
(3)

where \({y}_{i}\) stands for the \(D\) thcenter cluster in the \(i\) th particle solution. As a result, a large number of individuals are accessible to group. The Eq. 4 is used to assess the particle's fitness:

$$f(j)=\frac{{\sum }_{j=1}^{c}{\sum }_{i=1}^{c}|\left|n\left(x,y\right)-oj\right||}{{N}_{x,y}}$$
(4)

where \({N}_{x,y}\) are the real number and includes of the data sets due to the fact that it has been discovered that minimizing the dispersion of the clusters can reduce objective function. If (iterations > Its), the following steps can be avoided the predetermined number of repetitions is denoted by "Its". The Eq. 5 is used to store the best swarm particle's position vector [35]:

$${h}_{iD}={h}_{iD}+{a}_{1}*rand()\left({p}_{iD}-{x}_{iD}\right)+{w}_{1}*rand()\left({I}_{D}-{x}_{iD}\right)$$
(5)

where \({h}_{iD}\) is the velocity of the \(i\) th pixel in the \(D\) th element and \(i\) is the index of the best pixel in a group shown in Eq. 6.

$${x}_{iD}={x}_{iD}+{h}_{iD}$$
(6)

If \({x}_{iD}\) is not in the D-dimensions, it can be computed using Eq. 7.

$${x}_{iD}=\left({x}_{min}, {x}_{max}\right)$$
(7)

This basically states that maxima and minima are utilized to find the position and velocity of an outside edge particle. Therefore, the Eq. 8 is used to derive the reduction in inertia weight "\(W\)", which governs how a particle's preceding velocity affects its existing velocity:

$${H}_{it}=W*{H}_{i\left(t-1\right)}+{A}_{1}*{R}_{1}*\left({p}_{i}-{x}_{i\left(t-1\right)}\right)+{A}_{2}*{R}_{2}*\left({g}_{i}-{x}_{i\left(t-1\right)}\right)$$
(8)

where \({A}_{1}\) and \({A}_{2}\) are acceleration coefficients that control the maximum step size over the course of iterations, \({R}_{1}\) and \({R}_{2}\) are independent random variables with uniform distribution. The PSO algorithm examines each pixel value in the foreground and background separately. The proposed method introduces random speed behaviour between the particles. The fitness function of the PSO algorithm takes as inputs the PSO particle and its associated velocity, as well as the allowed velocity. The PSO particle is the foreground or background pixel value, depending on the situation.

3.4 Feature extraction and classification of lung cancer using CNN

The process of determining the attributes of the image's objects is known as feature extraction. Feature extraction can also be defined as the process of extracting manageable numerical features from unprocessed data while retaining the original data's substance. The feature extraction procedure is required to determine whether a picture is normal or abnormal and to depict the outcome. These characteristics form the foundation of the classification process. To improve the accuracy of identification, these undesirable regions must be removed. Because binary images do not need colour. To identify the locations of lung tumours, size and form characteristics such as area, major and minor axis length, and solidity were examined [36]. The only features thought to be recoverable were average intensity, area, perimeter, and eccentricity [37]. The scalar value represents the precise number of the nodule pixel's outline, which is calculated by adding the linked outlines of all the registered pixels in the binary image.

CNN is used to detect and classify lung cancer in hospitalised patients' CT images. CNNs are a subclass of neural networks. Neural networks attempt to emulate the way the brain works and its acquisition process. It accepts data, processes it, and provides output by passing it via multiple layers [38]. The input layer is the layer on the left, while the output layer is the layer on the right. The middle layers are commonly referred to as hidden layers since their values are not visible in the training set—this is where the magic occurs. The deeper a network is, the more hidden layers it has between the input and output levels. Each layer type might appear many times. The order of the layers is not defined but adheres to specific guidelines. CNN's goal is to take images and convert them into a format that is easier to understand while retaining the elements that are critical for making accurate predictions [39]. The network consists of the input layer, hidden layers, and output layer. The hidden layer contains convolutional, ReLU (rectified linear unit), pooling, fully linked, and other layers. The convolutional network was built using these layers [40]. The CNN layers are shown in Fig. 3. In Fig. 3 at dimensions part K, F & S represents kernel, filter and stride sizes respectively.

Fig. 3
figure 3

CNN Architecture

This input layer's CT images are sent to other layers for feature extraction. The convolution layer is added after the input layer. This layer applies the number of filters used to extract features from images. During the testing phase, these characteristics are used to make matches. For the convolutional layers in the first block, the filter size (F) and kernel size (K) are 3 × 3 and 128 × 128, respectively. The second block's convolutional layers have the same 3 × 3 filter size as those in the first block and have 256 × 256 filter counts. The 2 × 2 and 3 × 3 interleaved pattern was applied by the max-pooling layers from CNN architecture blocks 1 through 2. The blocks of convolutional-pooling layers are used to achieve feature extraction from input data. The hyperparameter values considered for CNN architecture in this work is shown in Table 1.

Table 1 Hyperparameter values

A feature map (FM) is created by using a filter that scans the entire image a few pixels at a time to forecast the class probabilities for each feature. Later, data is sent to the pooling layer, which retains the most important information while reducing the information generated by the convolutional layer for each feature (the convolutional and pooling layer processes are typically repeated multiple times). The fully connected layer is then applied. The output of the feature analysisweights is assigned to feature analysis input in order to predict a suitable label, generates the final classification probabilities for the image [41, 42]. The dimensions of FM can be calculated using the Eqs. 9 & 10.

$${H}_{o}=\frac{{H}_{i}-Fh+2p}{s}+1$$
(9)
$${W}_{o}=\frac{{W}_{i}-Fw+2p}{s}+1$$
(10)

where \({H}_{o}\) and \({W}_{o}\) represents FM's length and width as it emerges from the convolution layer.\({H}_{i}\) and \({W}_{i}\) represent the FM's length and width as it enters the convolution layer.\(Fh\) and \(Fw\) are the kernel size, \(p\) represents the padding around the input image and \(s\) is the filter's step size corresponding to the input array. The fully connected layer \(w\) is calculated using the Eq. 11.

$$Fully\;connected\;layer\;w=\left(n+1\right)*m$$
(11)

where \(m\) denotes output nodes, \(n\) denotesinputs to layer.A convolutional neural network is a deep learning system that takes an image as input and uses learnable weights and biases to distinguish between different items in the image. Figure 4 depicts the CNN's lung cancer diagnosis architecture.

Fig. 4
figure 4

CNN architectures for lung cancer detection

3.5 Attributes calculation

The outcomes of medical image segmentation are assessed using certain performance metrics. The various parameters like sensitivity, accuracy, specificity etc. were discussed and determined for performance analysis [43, 44].

3.5.1 Sensitivity

The ability of a test to precisely identify every person who is "true-positive" for the disease is referred to as sensitivity. It is evaluated by using Eq. 12.

$${s}_{e}=\frac{{Tr}_{P}}{{Tr}_{P}+{Fa}_{N}}$$
(12)

3.5.2 Specificity

Specificity is defined as a test's ability to detect "true negatives," or people who do not have the disease. It is evaluated by using Eq. 13.

$${s}_{P}=\frac{{Tr}_{N}}{{Tr}_{N}+{Fa}_{P}}$$
(13)

3.5.3 Accuracy

It measures how well a value corresponds to the available data.It is evaluated by using Eq. 13.

$${A}_{e}=\frac{{Tr}_{P}+{Tr}_{N}}{{Tr}_{P}{+Tr}_{N}+{Fa}_{P}+{Fa}_{N}}$$
(14)

where, \({Tr}_{P}\) is true positive, \({Tr}_{N}\) is true negative, \({Fa}_{P}\) is false positive and \({Fa}_{N}\) is false negative.

3.5.4 MSE

The mean-square error (MSE) and peak signal-to-noise ratio (PSNR) are used to compare image compression quality. The MSE determines the overall error function between the original and compressed pictures, whereas the PSNR determines the peak error. As the MSE value decreases, so does the error.It is evaluated by using Eq. 15.

$$MSE=\frac{\sum_{M,N}[I1(m,n)-I2(m,n){]}^{2}}{M*N}$$
(15)

where, \(M\) denotes number of rows and \(N\) denotes number of columns.

3.5.5 PSNR

The PSNR block calculates the peak signal-to-noise ratio in decibels between two images. Using this ratio, the original and modified image quality are compared. The quality of the adjusted image improves when the PSNR is increased. It is evaluated by using Eq. 16.

$$\mathit{PSNR}={10\mathrm{ log}}_{10}\left(\frac{{R}^{2}}{MSE}\right)$$
(16)

where, \(R\) is the input image's largest variation and \(MSE\) is the mean-square error.

3.5.6 Entropy

It determines the amount of random noise in the data being processed. It is evaluated by using Eq. 17.

$$Entropy=\sum \left[P\left({Y}_{i}\right)\mathrm{log}\left(\frac{1}{P\left({Y}_{i}\right)}\right.\right]$$
(17)

where, p represents probability of random variable Y at ith position.

3.5.7 SSI (Structure Similarity Index)

It is a method for predicting the perceived quality in an image. It is evaluated by using Eq. 18.

$$SSI=\frac{\left(2{{M}_{X}M}_{Y}+\mathrm{C}1\right)\left(2{\mathrm{N}}_{\mathrm{XY}}+\mathrm{C}2\right.}{\left({M}_{x}^{2}{+M}_{y}^{2}+C1\right)\left({N}_{X}^{2}{N}_{Y}^{2}+C2\right)}$$
(18)

where, \(X\) is segmented image, \(Y\) is the recovered image, \({M}_{X}\&{M}_{y}\) represents the average values of \(X \& y, {N}_{X}\&{N}_{y}\) represents variance of \(X \& y\) and \({\mathrm{N}}_{\mathrm{XY}}\) represents the covariance of image x and y.

3.5.8 Correlation

Correlation is the statistical measure of the relationship between two variables in an image. It is evaluated by using Eq. 19.

$$Cov\left(O,S\right)=\mathrm{E}\left[\mathrm{O}-\mathrm{E}(\mathrm{O})\right]\times \left[\mathrm{S}-\mathrm{E}(\mathrm{S})\right]$$
(19)

where \(Cov(O,S)\) denotes the correlations of neighboring pixels between the segmented image \(\mathrm{S}\) and the initial image \(\mathrm{O}\).

3.5.9 Processing time

The time taken for processing measures the amount of time it requires for the proposed algorithm to detect and categorise lung cancer in CT images. In order to determine the processing time, MATLAB functions such as the tic and toc commands are employed.

4 Simulation results

A novel method based on deep learning and optimization was proposed in this work to detect and classify lung cancer on CT images. To begin this work, CT images were obtained from public databases such as the LIDC & Kaggle. This work is carriedout on nearly 90images. The obtained lung CT images are fed into the proposed method. Figure 5 shows the few of input CT lung images. The input image-1 & input image-2 are collected form LIDC data base where as image-3 is collected from Kaggle data base.

Fig. 5
figure 5

a CT lung Input image-1 (b) CT lung Input Image-2 and (c) CT lung Input image-3

In general, the acquired input images contain some form of noise. It cannot be processed for further analysis due to the presence of such noise. So, in order to improve the image quality, the input image was pre-processed using patch processing and a median filter to remove noise. The median filter reduces the intensity variation between two pixels. By preserving the edges of the CT lung input image, it reduces salt and pepper noise. Figure 6 shows the median filtered output images for CT lung input images 1, 2 & 3.

Fig. 6
figure 6

a Median filter output of CT lung Input Image-1 (b) Median filter output of CT lung Input Image-2 and (c) Median filter output of CT lung Input Image-3

Even though the median filter was used to remove noise, some undesirable flaws remain that can be eliminated further by using patch processing. Patch processing is a technique for removing unwanted flaws from an image that uses patching to replace selected shaped portions with a surface suited to other arbitrarily shaped regions and a synthetic noise component. Figure 7 shows the patch processing output of CT lung input images-1, input image-2 and Input Image-3.

Fig. 7
figure 7

a Patch processing output of CT lung input image1 (b) patch processing output of CT lung input image 2 and (c) patch processing output of CT lung input image 3

After obtaining the best-quality pre-processed image, it goes through the segmentation process to detect the lung tumour in the pre-processed image. For segmentation, k-means clustering is used in this work. This method divides the data set provided into two or more clusters. The cluster centre selection completely determined the efficiency of k-means clustering. The distance determined the cluster centre. Six clusters were formed from the input CT lung images. Figure 8 shows the clustering output of CT lung input images-1, 2 & 3 with clusters.

Fig. 8
figure 8

a Clustering Image of Input Image-1 (b) Clustering Image of Input Image-2 and (c) Clustering Image of Input Image-3

The optimisation technique was used to select the best cluster from among six clusters. PSO was used in this work to accurately select the best cluster for tumour detection with a minimum distance that was calculated based on the objective value the position and velocity were updated and find the minimum distance between clusters. Figure 9 shows the detected output image from PSO.

Fig. 9
figure 9

a Detected Output by using PSO Input Image-1 (b) Detected Output by using PSO Input Image-2 and (c) Detected Output by using PSO Input Image-3

After obtaining the detected images from the segmentation process, the characteristics are obtained and fed to the classifier for classification.

Deep learning is equipped with automatically determining noteworthy characteristics without any demand for human involvement. This is particularly valuable for assignments with difficult-to-define characteristics, such as image classification. These algorithms are capable of managing huge and complicated datasets. As a result, it is a powerful tool for gaining insights from massive data. These algorithms sometimes entail training huge, complicated neural networks on big datasets, which can be computationally and time-consuming. However, depending on the dataset, the time complexity of both the training and testing phases can vary greatly.

CNN is used in this work for classification and feature extraction. The CNN classification determines whether the lung tumour was normal (lung tumour was not affected) or abnormal (lung tumour was affected) based on the features obtained. Figure 10 shows the classification output for Input image-1, Input image-2 and Input Image-3 respectively.

Fig. 10
figure 10

a Classification output of Input Image-1 (b) Classification output of input image-2 and (c) Classification output of input image-3

The detected images were subjected to objective analysis after classification to determine positive statistical parameters such as MSE, PSNR, Specificity, sensitivity, accuracy, entropy, SSI (Structure Similarity Index) and processing time. Table 2 displays the statistical parameters for input image-1, input image-2 and input image-3.

Table 2 Statistical parameters related to input image-1, input image-2 and Input Image-3

Table 2 clearly shows that the accuracy of input image-1 is 98.910%, for input image-2 is 99.967% and that of input image-3 is 98.70%. Similarly, the proposed method produces lower MSE values of 0.035, 0.031 and 0.037 for input images-1, 2 & 3 respectively. The time consuming nature associated with the training and testing phases might vary substantially depending on the dataset. The processing times for input image-1,2 & 3 are 28.72, 28.84 and 31.39 respectively.

The graphical representation of statistical parameters in terms of performance and grey level values are shown in Figs. 11 and 12 respectively.

Fig. 11
figure 11

Graphical representation of performance attributes

Fig. 12
figure 12

Graphical representation of grey level attributes

Finally, the proposed method demonstrated high accuracy (99.96%) in the detection and classification of lung tumours in CT images compared to the existing systems as mentioned in Table 3.

Table 3 Comparative results

5 Conclusion

Lung cancer is a severe health issue that affects millions of individuals globally. Currently, a variety of conventional and machine learning techniques are applied to these computer-controlled detection systems to identify lung cancer in its early stages, but these computer-based detection systems do not provide accurate detection and the processing of lung cancer detection takes a long time. A novel method for identifying lung cancer was developed that employs deep learning algorithms with patch processing and median filter are used to remove noise in pre-processing stage for accurate detection.

The system will be trained with enormous databases in the future for identifying the type of cancer depending on its size and shape. The system's overall accuracy can be enhanced by utilising 3D Convolutional Neural Network and also through strengthening the neurons that remain concealed using a deep network.