1 Introduction

The inception of “machine learning” dates back to 1959, when Arthur Samuel, a pioneering figure in the domains of artificial intelligence and computer gaming hailing from the United States, coined this term. Samuel devised a program that played checkers and utilized self-play to enhance its gameplay over time, paving the way for contemporary machine-learning algorithms. Presently, machine learning has become ubiquitous in various domains like natural language processing, image recognition, and other recommendation systems. For instance, image recognition algorithms can learn from vast collections of labelled images to recognize objects in new pictures, while natural language processing algorithms can learn from massive datasets of text to detect speech or translate languages [1]. Machine learning, which falls under the umbrella of artificial intelligence and computer science, involves the development of algorithms and models that enable computers to learn and make predictions or decisions autonomously, without the need for explicit programming instructions. This involves feeding extensive data to an algorithm, enabling it to identify patterns and connections within the data. Machine learning has emerged as a pivotal technology in numerous sectors, such as healthcare, finance, and e-commerce, fundamentally transforming how we analyze data and make informed decisions. However, it also poses several challenges, such as the necessity for enormous amount of data and the comprehensibility of the decision-making procedure of the algorithm. The development of more robust and ethical machine learning systems is an ongoing research area in the field [2, 3].

Machine learning roots can be detected to the mid-twentieth century and it has since been applied in a broad spectrum of practical applications. One of the earliest examples of machine learning was Cybertron, an experimental “learning machine” created by Raytheon Company in the 1960s. This device used punched tape memory to analyze sonar data, electrocardiograms and speech patterns. The Cybertron was repeatedly underwent training by a human operator to recognize patterns and was equipped with a “goof” button to reconsider bad choices. During this period, machine learning research primarily focused on pattern categorization, as demonstrated by the book Learning Machines by Nilsson. Pattern recognition continued to be an area of interest as highlighted by Duda and Hart in 1973, In 1981, researchers presented. Studies have been conducted to train a neural network in recognizing a set of 40 characters that are commonly found in computer terminals. These characters consist of 26 letters, 10 numbers, and 4 special symbols. This marked a significant development in machine learning and paved the way for future advancements in the field [4].

Two primary objectives of modern machine learning are: one is to group data into categories using pre-existing patterns, and the other is to predict future outcomes based on those patterns. For example, by using computer vision and supervised learning, we can train an algorithm to recognize malignant moles based on their appearance. A stock trading machine learning algorithm could alert the trader to potential future predictions [5].

In summation, Machine Learning is ideal for:

  • Machine learning can reduce the need for extensive manual adjusting or lengthy lists of rules for solving complex issues that existing solutions struggle to handle effectively [6].

  • Machine learning can solve complicated issues that traditional methods cannot realistically address by utilizing the best techniques and adapting to new data in different circumstances [7].

  • Machine learning can gain valuable insights and knowledge from massive amounts of data and complex situations, simplifying the information into understandable sentences [8] (Fig. 1).

Fig. 1
figure 1

Overview diagram of machine learning

Artificial learning techniques have a pivotal role to play in cancer classification by analyzing complex and high-dimensional datasets. These algorithms automatically select relevant features or biomarkers from large-scale datasets, improving the accuracy and efficiency of cancer classification models. By identifying hidden patterns and relationships in the data, machine learning algorithms can discover subtle associations between genetic or molecular markers and different cancer types, leading to improved classification accuracy. The algorithms learn from labeled examples to build predictive models, iteratively adjusting their parameters to optimize performance. Ensemble methods, including random forests and gradient boosting, amalgamate multiple models to improve accuracy of predictions, enhancing the robustness of categorization cancer models [9]. Machine learning algorithms can be deployed in real-time systems to provide rapid analysis and decision support for cancer diagnosis and treatment, improving patient care and treatment outcomes. Their application in cancer research and clinical practice enables advancements in precision medicine and personalized treatment strategies. Moreover, machine learning algorithms excel at handling the inherent complexity of cancer datasets, which often contain an overwhelming number of variables and intricate relationships. By employing advanced mathematical and statistical techniques, these algorithms can effectively navigate through the vast data landscape, uncovering nuanced patterns and connections that may elude human observation.

The capacity of machine learning algorithms to understand labeled examples is a cornerstone of their success in cancer classification [10]. By leveraging large annotated datasets, these algorithms acquire the ability to recognize subtle variations and distinguish between different cancer types with remarkable accuracy. The iterative process of adjusting parameters and fine-tuning models enables continuous refinement and optimization, leading to ever-improving performance and enhanced diagnostic precision.

Ensemble methods, such as random forests and gradient boosting, provide an additional layer of strength and reliability to cancer classification models [11]. By combining multiple individual models, each with its own strengths and weaknesses, these ensemble methods harness the collective intelligence of diverse algorithms, resulting in more robust and resilient predictions. The collaborative nature of ensemble learning mitigates the risks of overfitting and enhances generalization capabilities, ultimately improving the overall accuracy and stability of cancer classification systems. When integrated into real-time systems, machine learning algorithms become invaluable tools for rapid analysis and decision support in cancer diagnosis and treatment. These algorithms has the capability to handle massive volumes of data in real-time, quickly extracting pertinent information and providing clinicians with evidence-based insights. The timely delivery of accurate and actionable information empowers medical practitioners can utilize this technology to make well-informed decisions, customize treatment approaches, and ultimately enhance patient outcomes [12, 13].

To conclude, the utilization of machine learning algorithms in categorizing cancer has shown significant progress represents a remarkable developments in the oncology field. The capacity to analyze complex databases, identify hidden patterns, optimize models, leverage ensemble methods, and provide real-time decision support has revolutionized the way we approach cancer diagnosis and treatment. As we continue to refine and expand these algorithms, their impact on precision medicine and personalized treatment strategies will undoubtedly continue to grow, offering hope and improved outcomes for cancer patients worldwide.

Machine learning techniques continue to evolve, and researchers are exploring new approaches for cancer prediction beyond traditional methods. Here are some novel approaches in clustering, feature selection, and other areas of machine learning in predicting cancer:

  1. 1.

    Deep Learning and Neural Networks: Deep learning techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have gained popularity in cancer prediction. These models have the ability to autonomously acquire pertinent features from intricate medical data, including images and genomic sequences [13, 14].

  2. 2.

    Transfer Learning: Transfer learning involves leveraging pre-trained models on large datasets from related domains and adapting them for cancer prediction tasks. By utilizing knowledge from other domains, transfer learning can improve forecasting accuracy, especially when the amount of cancer-specific data is limited [15].

  3. 3.

    Unsupervised Clustering: Unsupervised clustering algorithms help identify distinct subtypes by analyzing the patterns of gene expression or other molecular data. This allows for personalized treatment strategies and a better understanding of tumor heterogeneity [16].

  4. 4.

    Ensemble Methods: Ensemble methods bring together various machine learning approaches to create a unified models to generate more precise forecasts. It can be strengthened using methods such as bagging, boosting, and stacking and expansion of cancer prediction models [17].

  5. 5.

    Feature Selection with Genetic Algorithms: Genetic algorithms can optimize the feature selection process by iteratively selecting subsets of relevant features that maximize the performance of cancer prediction models. This approach helps reduce dimensionality and improve model interpretability [18].

  6. 6.

    Multi-Omics Integration: Cancer prediction often involves integrating data from multiple sources, such as genomics, transcriptomics, proteomics, and clinical data. Machine learning techniques that can effectively integrate multi-omics data offer a comprehensive view of cancer biology and improve prediction accuracy [19].

  7. 7.

    Explainable AI: Interpretability is crucial in healthcare applications. Researchers are developing machine learning models with built-in explainability, allowing clinicians to understand the reasoning behind predictions. This helps build trust and facilitates the adoption of machine learning models in clinical practice [20].

  8. 8.

    Longitudinal Data Analysis: Cancer progression involves temporal changes, and analyzing longitudinal data can provide valuable insights. Machine learning approaches that model the temporal dynamics of cancer, such as recurrent neural networks and hidden Markov models, enable accurate prediction of disease progression and treatment response [21].

These are just a few examples of the evolving approaches in machine learning for cancer prediction. Ongoing research and advancements in the field continue to expand the repertoire of techniques and improve our understanding of cancer biology and treatment.

1.1 Various Types of Cancer Data Analysis

When conducting analysis various forms of data are available in the field of cancer that are commonly used to gain insights and make sensible decisions. The type of data utilized depends on the specific research objectives and the available resources. Here are some commonly used data types for cancer analysis used in this survey:

  1. 1.

    Clinical Data: This includes patient-related information such as demographic data, medical history, symptoms, treatment records, laboratory results, pathology reports, and clinical outcomes. Clinical data provides essential insights into patient characteristics and disease progression [22].

  2. 2.

    Genomic Data: Genomic data involves studying the genetic makeup of cancer cells, including DNA sequencing data, gene expression profiles, and genetic variations. This data helps identify genetic mutations, gene expression patterns, and potential biomarkers for cancer diagnosis, prognosis, and treatment selection [23].

  3. 3.

    Imaging Data: Imaging data is produced by medical imaging modalities like X-rays, CT scans, MRI scans, and PET scans. These images provide detailed information about tumor location, size, shape, and characteristics, aiding in cancer diagnosis, staging, and treatment planning [24].

  4. 4.

    Omics Data: Omics data refers to large-scale molecular data, including transcriptomics, proteomics, metabolomics, and epigenomics. These data provide insights into the molecular changes occurring in cancer cells and can help identify novel therapeutic targets and biomarkers [25].

  5. 5.

    Electronic Health Records (EHR): EHRs contain comprehensive patient information, including medical history, diagnoses, treatments, and outcomes. Mining EHR data allows researchers to study large patient populations and identify patterns and trends related to cancer incidence, treatment response, and patient outcomes [26].

  6. 6.

    Publicly Available Datasets: Various public databases and repositories provide researchers with access to curated cancer datasets, such as The Cancer Genome Atlas (TCGA), Gene Expression Omnibus (GEO), and International Cancer Genome Consortium (ICGC). These datasets enable comparative analyses, validation of findings, and collaborative research [27] (Table 1).

Table 1 Cancer data sets and short Description [28,29,30,31,32,33,34]

1.2 Implication and Uniqueness of Comprehensive Review of the Literature with Existing SLR in Cancer Prediction

The implication and uniqueness of comprehensive review of the literature (SLR) in cancer prediction lie in its unique contribution to the understanding of applications and techniques of machine learning in cancer classification. Compared to existing SLRs, this review offers several distinctive features:

  • Comprehensive scope: This SLR encompasses a diverse array of machine learning applications and methodologies specifically focused on cancer classification. It provides a holistic view of the field, covering various data sources, such as imaging, genetic markers, and clinical data, and their utilization in accurately classifying different types of cancer.

  • Actual use cases: The review presents real-world use cases where machine learning algorithms have been implemented on medical data to classify cancer and predict outcomes. By highlighting these practical examples, it demonstrates the efficacy and potential of machine learning in clinical settings.

  • Comparative analysis: A comparative analysis table is provided, enabling readers to compare this SLR with existing ones. This analysis emphasizes the unique aspects and contributions of the current review, such as its focus on specific applications, comprehensive scope, or novel insights, setting it apart from previous studies.

  • Implications for readers: The review’s key takeaways and implications provide valuable insights for readers. This statement emphasizes the capacity of machine learning to enhance cancer detection, tailor treatment approaches, and forecast patient outcomes. It enables individuals to make well-informed choices regarding the implementation of machine learning methods in medical settings.

Overall, the implication and originality of this SLR in cancer prediction stem from its comprehensive scope, real-world use cases, comparative analysis, and actionable insights, making it a valuable resource for researchers, practitioners, and anyone interested in machine learning applications for cancer classification (Fig. 2).

Fig. 2
figure 2

Concept map of machine learning for cancer classification

1.3 Search Criteria, Inclusion/Exclusion Criteria for Conducting a Review Work

In our review of distinct types of machine learning systems, in the context of systematic reviews and meta-analyses, we used the Recommended Reporting Items (PRISMA) methodology to ensure a comprehensive and systematic approach. The PRISMA method consists of four key steps: identification, screening, eligibility, and inclusion. During the identification step, we utilized relevant and important keywords to search for eligible articles, limiting our study to those published in the English language. We also employed a snowballing technique to identify additional relevant articles by examining the references of the selected articles. Next, we conducted a screening process to evaluate the relevance of each article based on our inclusion criteria. We excluded articles that did not meet our criteria or were duplicates, resulting in a final set of eligible articles. In the eligibility step, we evaluate the quality of each eligible article based on its scientific rigor, methodology, and relevance to our review topic. Articles that did not meet our quality criteria were excluded from our analysis. Finally, in the inclusion step, we selected the final set of articles that met all of our standard and involved them in our review.

By utilizing the PRISMA method, we ensured a comprehensive and systematic approach to our review of different types of machine learning systems. This method enabled us to identify and evaluate a high-quality set of articles that provided relevant and informative insights into the field of machine learning (Fig. 3) (Table 2).

Fig. 3
figure 3

Systematic approach for conducting a literature review

Table 2 Features of Machine Learning in Medical Field [2, 3, 5, 7, 9].

1.4 The Objective of the Paper is as Follows

  • To systematically review the applications and techniques of machine learning in cancer classification.

  • To assess the efficacy and precision of machine learning algorithms in differentiating and classifying various types of cancer.

  • To identify the potential of machine learning in predicting patient outcomes and personalizing cancer treatment approaches.

  • To provide insights and implications for the application of machine learning methods in clinical environments to enhance cancer diagnosis and treatment.

1.4.1 Outline of the Paper

This research paper is divided into six sections that make it easy to follow and understand. In the beginning Sect. 1, the paper talks about why it’s important to use machine learning in diagnosing and classifying cancer. It also explains the different ways researchers analyze cancer data and how they conducted their review. The Sect. 2 discusses the different types of machine learning systems, explaining how they work. Then, the Sect. 3 of the paper looks at the advantages and disadvantages of using machine learning in cancer research in the third section. The Sect. 4 is all about how machine learning is applied to classify different types of cancer. Section 5 goes into more detail about machine learning, discussing its ins and outs. Finally, the Sect. 6 concludes the paper and suggests ideas for future research and how to apply the findings. Overall, this paper is a fascinating read that shows how machine learning can help with cancer diagnosis and opens up possibilities for future advancements. Section eight discusses the future of machine learning, including its potential and challenges, and the future direction of research in this field. Finally, the conclusion summarizes the paper, explores the implications of the findings, and provides suggestions for future research.

2 Different Categories of Machine Learning Systems

Purpose: Discussing different machine learning categories helps to understand their applications and choose the most suitable approach for specific problems. It also drives research, promotes interdisciplinary learning, and aids in addressing ethical concerns.

2.1 Learning with Guidance

Learning with Guidance (Supervised Learning).

When it comes to supervised learning, we furnish a model that has access to train dataset with labelled instances. The algorithm then leverages this labelled data will be used to acquire knowledge about the relationship between input variables (feature) and output (labels) using a specific algorithm for learning. During training, the algorithm attempts to identify the underlying pattern or the relationship between the variables used as input and output, by minimizing the error or loss function. The ultimate objective is to generalize this connection to previously unknown data, i.e., to generate accurate predictions on previously unseen input data. Supervised learning finds utility in diverse domains, such as image classification, audio identification, and natural language processing, and predictive modeling. In image classification, for example, the input variables may be the pixel values of a photograph, while the output variable could be the associated label or class of the object in the image. The beauty of supervised learning is its ability to learn from labeled data and generalize to new, unseen examples. This is achieved through the application of different learning algorithms, such as decision trees, neural networks, support vector machines, and linear regression, to name a few. These algorithms are capable of handling large and complex datasets, and can be trained on a variety of input and output data types, including numerical, categorical, and textual data [35].

In summary, supervised Training is a strong technique that enables robots to understand from labelled data and make accurate predictions on fresh, previously unseen samples. It serves as a fundamental technique in machine learning and finds extensive usage in a wide range of practical scenarios applications (Fig. 4).

Fig. 4
figure 4

Supervised Learning workflow [36]

There are two types of supervised learning:

2.1.1 Classification

Classification constitutes a fundamental undertaking in supervised learning, where the objective is to train a model to forecast the class designation of a given input by considering its distinctive attributes. A common example of this is a spam filter, which is designed to classify incoming emails as either spam or legitimate. To train a spam filter, the model is fed a large dataset of example emails along with their corresponding labels, which indicate whether each email is spam or not. The model then uses these examples to learn patterns and features that are indicative of spam, such as specific keywords, phrases, or formatting. After trained, the model may be used to categorize data new incoming emails by analyzing their features and predicting their label. A well-trained spam filter can significantly improve the user experience by reducing the amount of unwanted or malicious emails that are received, while also ensuring that legitimate emails are not mistakenly flagged as spam [37].

Cancer classification is the process of categorizing different types of cancers based on their characteristics, such as the site of origin, histological features, genetic mutations, and clinical behavior. Accurate classification of cancer plays a significant part in ensuring precise diagnosis, treatment planning, and predicting patient outcomes [38].

Here are some common methods used for cancer classification:

Histopathology: This involves examining cancerous tissue samples under a microscope to analyze their cellular and tissue characteristics. Pathologists classify tumors based on their morphology, including cell type, degree of differentiation, and tissue architecture [39].

Immunohistochemistry (IHC): IHC uses specific antibodies to identify and classify cancer cells by considering the presence or absence of particular proteins, cancer can be classified effectively or markers. It helps determine the origin of the tumor and can provide information about potential therapeutic targets [40].

Molecular profiling: This approach involves analyzing the genetic and molecular alterations within cancer cells. Techniques such as DNA sequencing, gene expression profiling, and proteomics can identify specific mutations, gene amplifications, or changes in gene expression patterns. Molecular profiling can help classify tumors into different subtypes and guide targeted therapies [41].

Imaging techniques: Imaging modalities like Computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET) scans offer valuable insights into the position, dimensions, and scope of tumors, furnishing crucial information in the process. Radiologists use imaging findings to classify cancers and determine the stage of the disease [42].

Classification models: Machine learning and artificial intelligence algorithms can be trained using large datasets to develop predictive models for cancer classification. These models can incorporate various data types, such as clinical information, imaging data, and molecular profiles, to classify tumors and assist in diagnosis and treatment decisions [43].

To further illustrate the concept of classification, let’s consider a few more examples:

Image Classification: In image classification, a model undergoes training to anticipate or forecast the category of a given image based on its visual features. For instance, a model can be trained to classify images of animals, such as cats and dogs. The model is fed an extensive collection of labeled images of cats and dogs, which it uses to learn features such as fur texture, shape, and size. Once trained, the model may be used to categorize fresh dog and cat photographs [44].

Sentiment analysis: Sentiment analysis involves classifying text data based on the sentiment expressed in it. For example, a model can be trained to predict whether a given movie review is positive or negative. The model is fed a large dataset of labeled movie reviews, which it uses to learn patterns in the language that are associated with positive or negative sentiment. The model, once trained, may be used to categorize fresh reviews based on their sentiment [45].

Fraud Detection: In fraud detection, a model is trained to identify fraudulent transactions in a given dataset. For instance, a bank can train a model to detect fraudulent credit card transactions by analyzing patterns in the data, such as unusual spending behavior or geographical location. Once trained, the model can be used to flag potentially fraudulent transactions in real-time [46].

In all of these examples, the key to successful classification is the model’s capacity to extract significant patterns and features from labeled data is enhanced as it undergoes the training phase. These patterns and traits may then be utilized to produce precise predictions based on fresh, previously unknown data.

Regression: Regression is a popular approach for predicting the significance of a numerical attribute using a collection of input characteristics or predictors. For example, estimating the price of an automobile based on factors such as mileage, age, brand, and model is a common use of regression. To train a regression model for this task, we need a set of examples that include both the labels (i.e., the prices of the cars) and their predictors (i.e., the mileage, age, brand, and model). Once the model has been trained, it can accurately predict the cost of a vehicle determined by its characteristics [47].

Regression can be applied to various domains, such as healthcare, finance, transportation, marketing, and education. In healthcare, for example, based on their medical history and other criteria, regression can be used to forecast the chance of a patient getting an illness. In finance, regression analysis can be employed to make predictions about stock prices by considering various market indicators. In transportation, regression can be used to predict the duration required for a vehicle to cover a specific distance based on various factors such as weather conditions, traffic, and road type [48].

Overall, regression is a valuable tool for predicting numerical values based on input features, and it has numerous applications in different domains, including the car industry, where it can help car dealerships estimate the value of trade-ins or buyers compare the prices of different cars based on their features [49].

2.1.2 Some of the Most Important Supervised Algorithms

KNN: Machine learning techniques such as K-Nearest Neighbours (KNN) are used for both classification and regression problems. Its operation is based on selecting the “k” closest neighbors to an object in the training dataset, where “k” is a positive integer of the user’s choosing. The algorithm determines the distance between the object and the rest of the dataset’s objects, and then selects the “k” nearest neighbors based on their proximity. After identifying the “k” nearest neighbors, the object is classified by assigning it to the class that is most prevalent among its neighbors. In regression tasks, the algorithm predicts the average value of the “k” nearest neighbors as the anticipated value for the object [50].

Overall, KNN is a straightforward algorithm that can be easily implemented and utilized in a diverse range of applications in machine learning.

LR: Logistic Regression is a widely adopted statistical technique employed to analyze datasets that encompass one or more independent variables, which have the potential to influence the outcome. It is widely used for binary classification tasks in which one of two probable outcomes must be predicted. For example, determining whether a patient is healthy or sick, or whether a candidate will pass or fail an exam. The relationship between the independent variables and a logistic function is used to describe the dependent variable, which converts the projected output to a number between 0 and 1. This value represents the apparently that the dependent variable is a positive outcome. Maximum likelihood estimation is used to train the logistic regression model, The procedure entails finding the parameter values that maximize the likelihood of observing the training data. Once trained, the model can be employed to predict outcomes for new data points. This is done by inputting the independent variable values into the logistic function and calculating the estimated probability of obtaining a positive outcome [51].

LR: Linear Regression is a statistical method for building a model that represents the relationship between one or more independent variables and one or more dependent variables. The main objective is to predict the value of the dependent variable based on the values of the independent variables. Linear regression is widely employed across various disciplines to comprehend the association between variables and make predictions. The model is built by finding a linear equation that best fits the data, enabling estimation of the dependent variable when provided with the values of the independent variables [52].

Finding the best line to depict the relationship between the independent and dependent variables is the goal of linear regression. The equation Y = ax + b, where Y stands for the dependent variable, X for the independent variable, a for the line’s slope, and b for the intercept, represents the line of best fit. It is possible to use linear regression for both simple linear regression, which involves a single independent variable, and multiple linear regression, which encompasses multiple independent variables. The linear regression model’s parameters are estimated using a number of strategies, including the ordinary least squares method and gradient descent. Once the parameters have been computed, the linear regression model can be utilized to predict new data points by inputting the values of the independent variables into the equation and computing the anticipated or expected value of the dependent variable [53].

SVM: A supervised machine learning method used for classification and regression analysis is the support vector machine (SVM). SVM is a powerful and widely used method because it can handle data that cannot be linearly divided by translating it into a higher-dimensional space with a linear boundary to separate the classes. In SVM, the purpose is to select the optimal hyperplane for classifying the data. The hyperplane is defined as the line or plane that is closest to the data each class’s points. The minimum distance between data points, known as the margin, is a crucial concept in SVM (Support Vector Machines). By identifying the hyperplane that is farthest from the nearest data points of each class, called support vectors, the margin is expanded. This technique allows SVM to handle both linear and nonlinear data by utilizing the kernel method, which transforms the data into a higher-dimensional space. A linear border can be created in this modified space to separate the classes. After locating the hyperplane, SVM can be applied to categorize new data points by assessing which side of the hyperplane they lie on. If a new data point is positioned on one side of the hyperplane, it will be assigned to a specific class. Conversely, if it falls on the other side, it will be assigned to a different class [54].

DT: The Decision Tree is a widely used algorithm utilized in machine learning and artificial intelligence to address classification and regression tasks. It takes the shape of a tree, describing a series of decisions and their accompanying results. Within a decision tree, an internal node signifies an attribute test, a branch signifies the result of the test, and a leaf node represents a prediction or class label. The construction of the tree involves recursively dividing the data into smaller groups using attribute values and selecting the split that generates the most consistent subsets (i.e., subsets with the highest proportion of instances belonging to the same class) [55].

The process of constructing a decision tree is iterated until certain stopping criteria are met, such as exceeding a certain level of occurrences in a leaf or a maximum tree depth. The entire tree can be used to produce predictions for new data points by following the path from the root to a leaf node and basing predictions on the outcomes of the tests at each internal node. Because it can handle both numerical and categorical data and is easily interpretable, decision trees are frequently employed. They are also relatively fast to and can handle large datasets. However, they are prone to overfitting, particularly when the tree becomes too deep, and can benefit from pruning or ensemble methods, such as random forests [56].

RF: A classification and regression ensemble learning system is called Random Forest. It is a kind of decision tree method that generates numerous decision trees and combines their prediction to obtain a more reliable and accurate outcome. Each tree in a random forest is produced using a random subset of the data and a random subset of the attributes. This process is repeated multiple times to build multiple trees, and the estimates made by each tree individually are combined through a majority vote (for classification) or average (for regression). Random forests offer several advantages over an individual decision tree. By combining the forecasts from many trees, they typically lessen overfitting and increase model accuracy, and increase its stability. They can also manage complex, non-linear connections between the attributes and the desired outcome, and are capable of handling a combination of numerical and category features. One of the key benefits of random forests is that they are simple to understand, as the feature importance can be estimated from the trees, and decision trees themselves are relatively easy to understand. Despite these advantages, random forests computation costs may be high to train, particularly for significant databases and large numbers of trees, and they It might not be the optimal option for datasets that possess a high number of dimensional features. However, for many problems, they provide a good balance between accuracy and interpretability, making them a popular choice for many machine learning practitioners [57].

NN: A machine learning system called a neural network is modelled after the way the human brain is structured and functions. It resembles a kind of synthetic neural network made up of several linked nodes, or artificial neurons, organized into layers.

The information that is transferred through by means of numerous layers in a neural network, where each layer performs a mathematical operation on the data and the result of each a layer is connected to the subsequent layer in the sequence. The model’s predictions are produced by the output layer called the final layer. During training, the model’s parameters, referred to as weights and biases, are adjusted to minimize the disparity between the predicted and observed outputs. Neural networks excel at handling intricate and non-linear connections between input and output variables, making them applicable to various tasks like image classification, natural language processing, and time series forecasting. Furthermore, they can be combined with other machine learning techniques, like decision trees, to create hybrid models that leverage the strengths of multiple algorithms. Despite their capabilities, neural networks can be challenging to design, train, and interpret, particularly for large and complex models, and can require significant computational resources. Additionally, they can be prone to overfitting, and may require regularization techniques, to avoid this, employ measures such as dropout or early termination. However, breakthroughs in processing power and novel designs, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have resulted in considerable gains in neural network performance and interpretability in recent years [58].

2.1.2.1 Unsupervised Learning

Unsupervised learning is a type of machine learning where algorithms are trained using datasets that lack explicit labels or annotations allowing models to learn from patterns and correlations in the data without the requirement for prior supervision or predetermined classifications. Unlike supervised learning, where the algorithms require both input and output data for training, unsupervised learning focuses only on input data, making it a practical and economical method for analyzing large and complex data sets. Clustering, anomaly detection, and dimensionality reduction are three prominent unsupervised learning approaches. Clustering algorithms bring together similar data points to find underlying relationships and patterns in the data. Anomaly detection is the process of discovering data points that vary considerably from the norm, whereas dimensionality reduction is the process of simplifying data by lowering its complexity and the number of variables [59].

Unsupervised learning finds extensive application across diverse domains, such as computer vision, natural language processing, and data mining. It has many practical applications, such as image and speech recognition, fraud detection, customer segmentation, and recommendation systems. In addition, unsupervised learning has proven useful in scientific research, such as clustering and identifying patterns in genetic data and analyzing social network structures. One of the main advantages of unsupervised learning is that it can be used to discover previously unknown patterns and relationships in data, without relying on human-defined categories or labels. This makes it ideal for applications where the data is too complex or too large to be manually labeled. Additionally, unsupervised learning can unsupervised learning offers valuable insights into the underlying structure and behavior of the data, enabling researchers to better understand the underlying linkages and trends in order to make more educated decisions. In conclusion, unsupervised learning is a powerful technique that has numerous applications across a broad range of fields. By leveraging unsupervised learning, researchers and practitioners can gain insights into large and complex data sets, allowing them to identify previously unknown patterns and relationships that can inform decision-making and drive innovation [60] (Fig. 5).

Fig. 5
figure 5

Unsupervised learning [61]

Unsupervised learning is subdivided into two groups.

2.2 Learning Without Guidance

2.2.1 Clustering

Clustering is a statistical technique that is widely employed in many different domains, including as bioinformatics, pattern recognition, machine learning, and many more. Its major purpose is to organise a collection of items so that objects in the same group have comparable properties to those in other groups. Clustering includes the use of different algorithms, each with its own idea of what constitutes a cluster and how to discover them quickly. Clusters are commonly defined as dense regions of data space, groupings with minimal distances between cluster members, intervals, or unique statistical distributions. Clustering might be considered a multi-objective optimisation problem, with the proper method and parameter settings determined by the dataset and intended application of the results. It is not an automatic process, and instead requires an interactive multi-objective optimization process of knowledge discovery that involves trials and errors. To achieve the desired outcomes, it is often necessary to modify the data preparation and model parameters. In summary, clustering is a valuable exploratory data analysis technique that enables efficient grouping of data for various applications, but it requires careful consideration of the data and a thorough understanding of the algorithms involved to produce useful results [62, 63].

2.2.1.1 Widely used Clustering Algorithms
  1. I.

    K-means: K-means clustering technique separates a given collection of data points into K groups based on their similarity. Based on the input data, this unsupervised learning technique determines the best cluster centroids. The objective of the process is to reduce the total sum of squared distances between individual data points and the centroid of the cluster [64]. Here are the steps involved in the K-means algorithm:

    1. 1.

      Choose K initial cluster centroids at random from the data points.

    2. 2.

      Assign each data point to the cluster that has the closest centroid to it.

    3. 3.

      Recalculate each cluster’s centroids by calculating the average of the data points within that cluster.

    4. 4.

      Continue with steps 2 and 3 iteratively until either the cluster assignments remain unchanged or the maximum number of iterations is reached.

The K-means algorithm is designed to reach a minimum of the goal function on a local scale through a series of iterations. It is crucial to consider that the initial selection of centroids can impact the final clustering outcome, leading to different cluster assignments and local optima. To address this issue, running the K-means algorithm with multiple initial centroid selections can yield the best possible clustering result. K-means has found wide application in data science, machine learning, and computer vision for clustering and image segmentation due to its simplicity, scalability, and efficiency. However, it comes with limitations as well as the requirement to indicate the number of clusters beforehand, sensitivity to initial centroid selection, and the assumption of a spherical cluster shape and equal cluster size. Numerous extensions and variants have been introduced, including hierarchical K-means, fuzzy K-means, and spectral clustering, to improve K-means performance in diverse scenarios and address some of its limitations [65].

  1. II.

    Hierarchical clustering.

Hierarchical clustering is an approach used to arrange data points into hierarchical and tree-like structures. Initially, each data point is considered as an individual cluster, and subsequently, clusters that are closest to each other are merged iteratively until a single cluster remains. There are two kinds of hierarchical clustering techniques: agglomerative and divisive. Each data point is the starting point for agglomerative clustering at each stage, the algorithm treats each cluster individually and combines the two nearest clusters. On the other hand, divisive clustering initiates with all data points in a single cluster and progressively divides them into two clusters at each stage. The distance between two clusters is established that utilize distance metrics, such as Euclidean distance, Manhattan distance, or correlation distance. Which is chosen based on the specific data and problem domain. A dendrogram is produced by hierarchical clustering, which is a tree-like diagram that depicts the order of cluster mergers. Each leaf node in the dendrogram corresponds to a data point, while each internal node represents a merged cluster. The distance between the merged clusters is represented by the height of each internal node. The figure of clusters to choose can be determined by examining the dendrogram. The cut-off point, the number of clusters corresponds to where the dendrogram is terminated. The cut-off point is determined by the desired level of granularity and the problem domain [66].

Hierarchical clustering offers several benefits, including its ability to handle non-convex clusters, the freedom of not having to predefine the desired number of clusters in advance, and the ease of interpretation provided by the dendrogram output. However, the agglomerative method can be resource-intensive in terms of computation, particularly when dealing with large datasets, and the choice of distance metric and linkage method can impact the clustering result. There are several linkage methods available, such as single, complete, average, and Ward’s method, each with its strengths and weaknesses, making it important to select the suitable linkage method based on the characteristics of the data and the specific problem domain. Hierarchical clustering finds extensive application in diverse fields, including biology, social science, and computer science, for clustering and classification tasks [67].

  1. III.

    Density-based clustering.

Density-based clustering is a method for grouping data points in a particular data space based on their density. The technique identifies high-density regions as clusters and distinguishes them from areas of low density. The clusters produced can have diverse shapes and sizes and do not need to be predetermined [68]. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is widely recognized as the most frequently used density-based clustering method.

The process of the algorithm is as follows:

  1. 1.

    Locate all the neighboring data points within a radius, epsilon, for each data point.

  2. 2.

    Designate a data point as a core point if it has a minimum number of neighboring points; otherwise, mark it as a noise point.

  3. 3.

    For each core point and its neighbors, recursively identify all the connected points within epsilon and categorize them as part of the same cluster.

  4. 4.

    Repeat steps 1 to 3 until all the data points have been visited.

DBSCAN produces a collection of clusters and noise points. The shapes and sizes of clusters are determined by the connected components of core points and their neighbors, and the term “noise points” refers to Data points that are not assigned to any cluster. DBSCAN has several advantages, including the ability to handle non-convex clusters, the lack of a requirement to define the number of clusters earlier, and the capability to tolerate outliers and noise. However, it also has some shortcomings, such as its sensitivity to the choice of distance metric and epsilon parameter and its difficulty in handling clusters with varying densities. To overcome these limitations, various modifications and alternatives to DBSCAN have been developed, such as Clustering Structure Ordering Points (OPTICS), Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) and Density-Based Clustering of Applications with Noise are two clustering techniques that focus on identifying clusters in data while also considering noisy or outlier points. Using a Localized Gaussian Mixture Model (DENCLUE), each designed to improve performance under different circumstances. Density-based clustering is a prominent approach for clustering and classification in a variety of fields, comprised of computer vision, image segmentation, and anomaly detection [69].

2.2.2 Dimensionality Reduction

Dimensionality reduction is a method of converting complex, high-dimensional data into a simpler form that preserves the most relevant characteristics. This approach attempts to lower the amount of variables while preserving the data’s underlying structure and connections. The resulting reduced representation should ideally have the same dimensionality as the data, which refers to the fewest parameters necessary to explain its attributes. Dimensionality reduction is an important technique in many industries since high-dimensional data can be difficult to store, handle, and analyze [70]. Data visualization can benefit from dimensionality reduction, which minimises the number of dimensions, categorization, and compression, among other things. This technique enables researchers and practitioners to gain insights from complex data sets that would otherwise be difficult or impossible to analyse [71]. However, it is important to use dimensionality reduction methods responsibly and to avoid any potential issues related to plagiarism by giving credit to the original authors of any related works [72, 73].

A technique called “dimensionality reduction” (DR) reduces the number of input variables in a dataset before employing machine learning models. It can be performed through either feature extraction or feature selection. Feature extraction reduces the size of the original dataset by deleting redundant and unnecessary characteristics, while preserving the maximum amount of information. Alternatively, the feature selection algorithm finds the most pertinent subset of characteristics from the input data that are relevant to the given problem. Employing the right DR approach will help you save time and effort when choosing and extracting important features for analysis [74]. There are several Dimensionality Reduction Techniques (DRTs) that may be used to shorten calculation time and make better use of computer resources. These strategies can be used during the pre-processing stage, prior to data analysis and machine learning model creation. Nevertheless, choosing the best DRT might be difficult because each approach was designed to preserve specific elements of the original data. Thus, a specific DRT may be suitable for some types of data or applications, but not appropriate for others. Additionally, some DRTs may be created with limitations that limit their scope and use [75]. In conclusion model, DRTs offer an efficient way to decrease the number of input variables prior to employing machine learning models, feature selection or dimensionality reduction techniques are commonly employed. However, it is crucial to carefully choose the appropriate dimensionality reduction technique based on the data type and the specific application in order to achieve optimal results (Fig. 6).

Fig. 6
figure 6

Overview of Dimensionality Reduction [76]

2.2.2.1 Feature Selection Approach for Dimensionality Reduction

Feature selection plays a vital role in the context of various disciplines such as pattern recognition, data mining, and statistical analysis. It involves identifying and selecting the most relevant features from a dataset while eliminating unnecessary or redundant information that can lead to biases or inaccurate models. Feature selection is particularly important when creating models for classification, regression, or clustering tasks. One of the key benefits of feature selection is that it makes it easier to visualize and analyse complex datasets, leading to a more accurate understanding of the underlying patterns and relationships. Additionally, feature selection can result in more compact models that are easier to interpret and have superior generalization capabilities. As a result, feature selection has become an increasingly popular area of research, with numerous methods and techniques developed throughout the last many decades to address the various dispute involved. Overall, effective feature selection is essential for achieving accurate and reliable results in many different fields of study [77].

Feature selection strategies can be classified into three types depending on the availability of labeled data in the dataset: supervised, semi-supervised, and unsupervised. In supervised feature selection, labeled data is necessary to identify and choose relevant features. Labels can be in the form of categories, ordered values, or real values, and must be assigned to each object in the dataset. Semi-supervised approaches may only require labels for some objects. However, to find essential characteristics, unsupervised feature selection algorithms do not rely on labeled data. Instead, they use statistical and mathematical methods to analyze the dataset and identify relevant features based on their characteristics and correlations with other features [78]. Numerous feature selection in recent years, numerous techniques have been devised primarily for supervised classification problems. However, due to recent technological advances and an abundance of unlabelled data in unsupervised feature selection (UFS) techniques have garnered considerable attention in the scientific community, particularly in various applications like text mining, bioinformatics, image retrieval, analysis of social media and intrusion detection. Moreover, UFS techniques offer two significant advantages over supervised techniques. First, they are objective and do not rely on prior knowledge, making them suitable for handling new classes of data. Second, they can aid in lowering the danger of data overfitting, which is a common problem with supervised feature selection techniques [79].

Three major methods of feature selection can be distinguished.

  • Filter methods are a category of feature selection technique that selects based on the most significant aspects solely on their characteristics within the dataset. These techniques do not utilize clustering algorithms to guide the search for important features. Instead, filter techniques evaluate each feature’s inherent features to determine its relevance to the target variable. One of the primary advantages one notable advantage of filter methods is their speed and scalability. Since they do not rely on complex algorithms or iterative processes, filter methods can analyse large datasets quickly and efficiently. Additionally, may be used on a variety of datasets, as well as those with high-dimensional features or large numbers of observations. Filter methods employ diverse statistical or mathematical techniques to assess the significance or importance of each feature. Some common approaches include correlation analysis, mutual information, chi-square test, and information gain. To evaluate relevance, these approaches examine the connection between the target variable and the characteristic. Correlation analysis, for example, checks linear connection refers to the correlation or relationship between a characteristic and the goal variable, while mutual information measures the extent of dependency between the two variables. Filter methods are widely used in various applications, including bioinformatics, text mining, and image analysis. They provide a simple and effective means of selecting relevant features, it might amplify the efficacy and accuracy of machine learning algorithms. However, filter methods do have limitations, such as the potential for irrelevant or redundant features to be retained, which can negatively impact model performance [80].

  • Wrapper approaches are a popular method used in machine learning to analyse feature subsets by utilizing the findings of a specific clustering algorithm. This method’s main goal is to find feature subsets that can enhance the calibre of the outcomes produced by the grouping method employed in the collection phase. One of the main advantages of wrapper approaches is that they are designed to be highly targeted and specific, resulting in improved accuracy and precision when compared to other approaches. However, this precision often comes at a cost, as wrapper approaches can be computationally expensive and may only be compatible with certain clustering algorithms. Despite their limitations, wrapper approaches are extensively used in many fields, including bioinformatics, image recognition, and natural language processing, where the need for precise feature selection is critical. By combining the findings of clustering algorithms with wrapper approaches, researchers and practitioners can uncover critical insights and improve the overall quality of their data analysis [81].

In conclusion, wrapper approaches are a powerful tool for analysing feature subsets and improving the accuracy and precision of clustering algorithms. While they do have their limitations, their effectiveness in targeted scenarios makes them a valuable asset to any machine learning practitioner’s toolbox.

  • Embedded methods in machine learning goal in order to balance effectiveness and efficiency when selecting relevant features for a given objective task. To accomplish their objectives, these strategies take advantage of the benefits of both filter and wrapper approaches are utilized. Filter methods use statistical techniques to identify relevant features by measuring their correlation with the output variable. These methods are efficient in terms of computation and can quickly process although they might not always be able to capture the complex connections between the goal variable and the characteristics. In contrast, wrapper approaches employ a trial-and-error methodology to evaluate the performance of a subset of qualities. While wrapper methods can capture complex feature interactions, they are computationally expensive and may overfit the data. Embedded methods attempt to strike a balance between these two approaches by integrating feature selection into the model-building process. These methods aim to identify relevant features during the model-building process, As a result, the number of characteristics that must be examined is reduced, as is the danger of overfitting. One popular example of an embedded method is the Lasso algorithm, which uses regularization to shrink the coefficients of irrelevant features to zero. This approach not only identifies relevant features but also performs feature selection during the model-building process, resulting in a more efficient and effective model [82].

In conclusion, embedded methods in machine learning offer a compromise between filter and wrapper methods by integrating feature selection into the model-building process. These techniques seek to balance efficacy and efficiency, resulting in more accurate models that are less prone to overfitting.

2.2.2.2 Feature Extraction Approach for Dimensionality Reduction

The method of feature extraction involves obtaining discriminatory data from a collection of samples. For the extraction of medically useful information from the textures, features must be computed. The traits, which may not be visually visible but are relevant to the diagnostic issue, can be thought of as supplements to the researchers’ visual abilities. Effective and distinctive features are extracted using a variety of feature extraction techniques. Below is an explanation of a few feature extraction techniques [83]. In order to extract valuable information from images, many feature extraction approaches are used in the processing of medical images.

  1. (a)

    Gray-Level Co-occurrence Matrix (GLCM): In medical image processing, GLCM is a popular texture analysis approach. It entails calculating the likelihood of pixel values co-occurring at particular pixel distances and directions in an image. The co-occurrence matrix is then used to compute various statistical measures such as contrast, correlation, energy, and homogeneity. These metrics can be utilised for classification or segmentation tasks as well as features to characterise the texture of a picture [84].

  2. (b)

    Local Binary Patterns (LBP): LBP is a basic yet strong texture analysis tool. It entails comparing each pixel in a picture to its neighbours and assigning a binary value based on whether the neighbours have greater or lower values than the centre pixel. Each pixel in the picture goes through this procedure once again to produce a binary pattern. These patterns can then be used as features to describe the texture of an image [85].

  3. (c)

    Gabor Wavelets: Gabor wavelets are a type of filter that is used to analyse the frequency and orientation content of an image. They are particularly useful for analysing texture because they can capture both the fine and coarse details of an image. Gabor wavelets can be used to extract features such as mean amplitude, mean frequency, and classification or segmentation task-specific orientation [86]

  4. (d)

    Histogram of Oriented Gradients (HOG): The HOG feature extraction technique that involves computing the gradient magnitude and direction of an image and then grouping these gradients into histograms based on their orientation. These histograms can then be used as features to describe the texture of an image. HOG has been shown to be particularly effective for object detection and recognition tasks [87].

  5. (e)

    Convolutional Neural Networks (CNN): A popular deep learning algorithm is CNNs, which are used for feature extraction from medical images. These networks are created to automatically recognise and extract characteristics from images by utilizing convolutional layers. Filters are applied using convolutional layers to an image, allowing the extraction of specific information and patterns. This enables CNNs to effectively capture complex spatial relationships and distinctive features in medical images. The output from the convolutional layers can then be used as features for classification or segmentation tasks. CNNs have been demonstrated to be extremely successful for a variety of medical image processing applications, including tumour identification and segmentation [88].

  6. (f)

    Recurrent Neural Network (RNN): RNN is a kind of neural network that is designed to process consecutive data, where the output at each step is influenced by previous steps. It has feedback connections that allow information to persist across different time steps, making it appropriate for problems like speech recognition, language modelling, and time series analysis. RNNs have a recurrent hidden state that captures and updates information as new input is fed into the network. The vanishing gradient problem, which affects traditional RNNs, limits their ability to detect long-term dependencies [89].

  7. (g)

    Long Short-Term Memory (LSTM): The LSTM represents a recurrent neural network developed to replace the vanishing gradient problem in typical RNNs. It includes a memory cell that enables the network to selectively store data, read, and write information over long sequences. LSTMs have gated mechanisms, comprising input, forget, and output gates that regulate information flow and enable the network to store pertinent data over time. For modelling long-term dependencies in sequential data, LSTMs are highly helpful. They are frequently employed in time-series prediction issues including machine translation, audio recognition, and natural language processing [90].

2.2.2.3 Linear and Non-Linear Approaches of Feature Extraction

The practise of lowering the number of variables in a dataset while maintaining important data and linkages is known as dimensionality reduction. It is possible to reduce dimension in two ways: linearly and non-linearly. In order to create a new set of variables that retains the majority of the crucial information inherent in the original data, linear dimensionality reduction techniques combine the original variables in linear combinations. The method of linear dimensionality reduction known as principal component analysis (PCA) determines the directions of the data that account for the most variance. On the other hand, non-linear dimensionality reduction techniques employ non-linear transformations of the original variables to generate a new set of variables that effectively capture the significant details within the data. Examples of non-linear dimensionality reduction techniques include Iso-map, t-SNE, and UMAP. These techniques are useful when the relationships between the variables in the data are non-linear [91].

Both linear and nonlinear dimensionality reduction solutions have advantages and disadvantages, and the methodology utilized is determined by the specific scenario and data type. Dimensionality reduction plays a critical role in various machine learning and data processing analysis applications. It offers several benefits, including noise reduction, enhanced visualization capabilities, and improved performance of other machine learning algorithms.

2.2.2.4 Feature Classification Algorithms

The process of grouping features based on criteria that divide data into multiple classes is known as feature classification, and it uses a variety of techniques, including:

  1. A.

    Support Vector Machine (SVM): Support Vector Machines (SVMs) are utilized as supervised learning algorithms in both classification and regression tasks. They are members of the generalised linear classification family. The capacity of SVM to maximise the geometric margin while simultaneously minimising the empirical classification error distinguishes it. SVM uses Maximum Margin Classifiers as a result. SVMs construct a maximum separation hyperplane and create two parallel hyperplanes on either side of the data-separating hyperplane. By mapping input vectors to a higher-dimensional space, SVM aims to find the hyperplane with the largest separation between the parallel hyperplanes, known as the separating hyperplane. According to the theory, increasing the distance or margin between these hyperplanes leads to a reduction in the classifier’s generalization error [92].

  2. B.

    Radial Basis Function (RBF): RBF is a classification method that uses nonlinear activation functions like sigmoidal and Gaussian Kernel for functional approximation and classification. As a result, the Gaussian function’s response is positive for all values of x and it approaches zero at |x tends 0|. As its name implies, RBF is proven to be radically symmetric because it yields the same results for any input values coming from the kernel’s centre [93].

  3. C.

    K-Nearest Neighbour (KNN): The most fundamental approach for classification and regression on k-nearest neighbour data sets is known as KNN. In order to forecast how a fresh data set will be classified, the KNN model divides the provided data into a number of classes. Acting as a “clustering model”. The majority decision of its k nearest neighbors is used as the basis for classification, it forecasts the membership of a class. Regression analysis uses the mean (average) of its k closest neighbours to represent the class. The Euclidean distance is used to calculate it [94].

  4. D.

    Linear Discriminant Analysis (LDA): Linear discriminant analysis is a widely used technique for reducing linear dimensionality (LDA) for classification and pattern recognition applications. LDA seeks a linear combination of the variables in the data that best divides the data’s multiple classes. Unlike PCA, which seeks to capture the maximum variance in the data, the objective of linear discriminant analysis (LDA) is to identify the directions that maximize the distinction between classes. LDA is particularly valuable when the classes in the data are well-separated and exhibit similar covariance matrices. In such cases, LDA can provide a better representation of the data for classification compared to other linear dimensionality reduction techniques like PCA [95].

In addition to its use for classification and pattern recognition, LDA is also used for data visualization, feature extraction, and for reducing the dimensionality of data for other machine learning algorithms. LDA is a rapid and computationally efficient algorithm that is frequently used in image and audio recognition, text classification, and bioinformatics applications.

2.2.2.5 Reinforcement Learning

Reinforcement Learning (RL) is a method within the field of machine learning that empowers an entity, known as an agent, to acquire knowledge and improve its performance from its interactions with its environment in order to maximise a reward signalIn the context of reinforcement learning, an agent interacts within an environment to achieve a specific goal. The agent receives feedback in the form of rewards or penalties based on its actions and decisions. In response to this feedback, the agent adjusts its behavior with the objective of incrementally maximizing the cumulative reward over time [96].

Reinforcement learning has been utilised in a variety of areas such as robotics, games, banking, and transportation. RL has been used in robotics to train robots to perform tasks such as grasping and manipulation, whilst in gaming, it has been utilised to construct AI agents capable of playing games at a human-like level. In finance, RL has been used to optimize portfolio selection and algorithmic trading, and in transportation, it has been used to develop autonomous driving systems [97].

Reinforcement learning is a highly effective tool for tackling intricate problems where a clear mathematical formulation of the problem is not available. It is especially beneficial in situations where an agent needs to learn from experience and make decisions based on its current state and environment. However, reinforcement learning can be challenging to implement and requires a lot of data to train the agent effectively. Additionally, it can be difficult to design effective reward functions, and the agent’s behavior may not always align with the desired outcome [98].

There are two primary types or approaches of reinforcement learning algorithms: value-based techniques and policy-based methods (Fig. 7).

Fig. 7
figure 7

Reinforcement learning workflow [99]

2.3 Learning Through Interaction

Value-based approaches estimate the value of doing a certain action in a specific condition. The value of an action measures how advantageous it is for the agent to do that action in the present condition. The agent determines which actions to take by assessing the estimated values. Value-based methods include Q-Learning and SARSA [100].

Policy-based approaches, on the other hand, are concerned with estimating a policy directly, which is linking states to actions. The policy defines the optimal action to take in each state, and the agent uses the estimated policy to make decisions. Policy-based methods include REINFORCE and Proximal Policy Optimization (PPO) [101].

In addition to value-based and policy-based strategies, there are actor-critic methods, which incorporate the qualities of both. The evaluation of the value of an action in a given state and the selection of the optimal action to take are crucial tasks in decision-making processes, actor-critic approaches combine a value function and a policy function.

The selection of the reinforcement learning algorithm is dependent on the characteristics or nature of the problem at hand and complexity of the problem at hand and the data. In some cases, value-based methods may be more appropriate, while in others, policy-based methods may be more suitable. The method used is also determined by the computer capabilities available and the amount of data available for training.

2.3.1 Widely used Reinforcement learning algorithms

2.3.1.1 Q-learning

Q-learning is a reinforcement learning technique employed to determine the optimal strategy for selecting actions within a Markov Decision Process (MDP) framework. In an MDP setting, an agent interacts with the environment, doing actions that result in new states and being rewarded appropriately. To maximise the overall accrued reward over time is the agent’s goal. A Q-value function, which calculates the predicted utility of carrying out a certain action in a specific condition, is updated through the Q-learning process. The Q-value function, denoted as Q(s, a), represents the expected cumulative reward that an agent can achieve by taking action “a” in state “s” and then following the optimal course of action thereafter. The policy that maximizes the expected cumulative reward is considered the best action to take. The Q-value function is updated iteratively using the Bellman equation, which takes into account the immediate reward, the discounted future rewards, and the Q-values of the next state-action pairs.

$$Q\left( {{\text{s,}}\,{\text{a}}} \right)\,\, = \,\,{\text{r}}\,\, + \,\,\gamma {\text{ max}}\,\left( {Q\left( {{\text{s}}^{\prime } ,\,{\text{a}}^{\prime } } \right)} \right)$$

In the Q-learning process, the Q-value function is updated using the Bellman equation, which takes into account the reward ‘r’ obtained for performing action ‘an’ in state ‘s’, The discount factor ‘γ’ is a parameter that determines the significance of future rewards, and the resulting state ‘s’. The term max (Q(s’, a’)) reflects the projected cumulative reward that the agent may expect to get in the next state “s” by choosing the action that maximises the Q-value function. Q-learning begins by arbitrarily initialising the Q-value function, and then iteratively updates the Q-values for every observed (s, a, r, s’) transitions using the Bellman equation. Through iterative updates, the Q-value function progressively converges towards the optimal Q-values, which correspond to the actions that maximize the expected cumulative reward. By selecting the action that maximizes the Q-value function for a given state, the agent can determine the optimal policy to follow. Q-learning is a model-free technique, which means that prior knowledge of the MDP transition and reward functions is not required. It is extensively utilised in numerous applications, such as game playing, robotics, and control systems. However, Q-learning may suffer from slow convergence and high variance due to the stochastic nature of the environment, and various improvements have been proposed, such as SARSA and Deep Q-Network (DQN) [102].

2.3.1.2 SARSA (State-Action-Reward-State-Action)

A reinforcement learning method called SARSA (State-Action-Reward-State-Action) was created for sequential environmental decision-making. It is an on-policy algorithm, which implies that when learning, it maintains the same policy while updating it. The goal of SARSA is to discover the best Q-values for each state-action pair that exists in the world. The predicted cumulative return that the agent will obtain by performing a certain action in a particular state is represented by the Q-value. Based on the observed reward, the agent’s subsequent state, and the action they decide to take, SARSA modifies the Q-values. The Q-values are modified using the next update rule:

$${\text{Q }}\left( {\text{s, a}} \right)\, < - Q \, \left( {\text{s, a}} \right)\, + \,{\text{alpha}}\,{*}\left( {{\text{r}}\,{ + }\,{\text{gamma}}\,{*}\,{\text{Q}}\,\left( {{\text{s}}^{\prime } ,\,{\text{ a}}^{\prime } } \right){-}{\text{Q }}\left( {\text{s, a}} \right)} \right)\,$$

The learning rate (alpha), the immediate reward (r) received by the agent, the discount factor (gamma) for future rewards, the next state (s’), and the subsequent action (a’) decided upon by the agent are all factors in updating the Q-value for a state-action pair (s, a) in SARSA. The SARSA uses an epsilon-greedy approach to strike a balance between exploration and exploitation. The agent prioritises exploitation and chooses the action with the highest Q-value with a probability of 1-epsilon. However, with a probability of epsilon, the agent chooses a random action, promoting exploration. The value of epsilon is typically decreased over time to encourage the agent to rely more on its learned policy and explore less frequently. SARSA is appropriate for environments in which the agent’s actions have a direct impact on the subsequent states, and When dealing with environments with continuous states, either a probability distribution across the action space is used, or the Q-values for all viable actions in that state are taken into account as inputs and outputs. Backpropagation and stochastic gradient descent techniques are used to train the network by reducing the mean squared error between the actual and expected Q-values or policies as determined by the Bellman equation. Value-based and policy-based deep reinforcement learning algorithms can be distinguished from one another. The goal of value-based methods like Deep Q-Networks (DQNs), which assess the predicted cumulative reward for each state-action pair, is to learn the best Q-value function. Actor-Critic algorithms, on the other hand, are policy-based techniques that concentrate on directly learning the best policy [103].

Actor-critic methods, which use deep neural networks to represent both the value function and the policy function, have made considerable strides recently in merging both approaches. Deep Deterministic Policy Gradients (DDPG) and Trust Region Policy Optimisation (TRPO), among others, are notable examples of these techniques. These methods make use of deep neural networks to simultaneously learn and optimise the policy and value functions, which improves task performance and flexibility for reinforcement learning tasks. The unstable nature of learning is one of the main issues with deep reinforcement learning, which can result from the non-stationary targets and the correlation between samples. To tackle this problem, numerous techniques have been developed such as experience replay, target networks, and batch normalization. Over all, deep Reinforcement learning has demonstrated significant promise in addressing difficult issues that were previously assumed to be beyond the reach of classic reinforcement learning approaches. However, it still requires careful tuning of hyper parameters and significant amounts of training data to achieve good results.

3 Pros and Cons of Machine Learning Systems

See Table 3.

Table 3 Summarizes the advantages and disadvantages of each type of machine learning system [12]

4 Application of Machine Learning in Cancer Classification

In a study, Sung-Bae Cho and Hong-Hee used three benchmark datasets to evaluate various traits and classifiers. Their goal was to objectively assess feature selection techniques and machine learning classifiers. The Leukaemia cancer dataset, the Colon cancer dataset, and the Lymphoma cancer dataset served as the study’s benchmark datasets. To choose the features, they considered a number of factors, including the signal-to-noise ratio, information gain, mutual information, Euclidean distance, Pearson’s and Spearman’s correlation coefficients, and cosine coefficient. They used support vector machines, multi-layer perceptrons, k-nearest neighbours, and structure-adaptive self-organizing maps for the classification process. They also integrated classifiers to boost classification performance. According to the experimental findings, the ensemble of numerous basic classifiers offered the benchmark dataset’s greatest recognition rate [104].

Using the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, Nikita Rane and her fellow researchers conducted a study in which they examined six different machine learning methods. Naive Bayes (NB), Random Forest (RF), Artificial Neural Networks (ANN), Nearest Neighbour (KNN), Support Vector Machine (SVM), and Decision Tree (DT) were the algorithms they took into consideration in their study. This dataset was being produced from a digitized picture of an MRI scan. The training phase and testing phase of the dataset were being used to implement the machine learning algorithms. The website’s backend was going to be built using the algorithm that was performing the best, and the model was going to categorize cancers as benign or malignant [105].

Epimack Michael and his coworkers have presented a computer-aided diagnostic (CAD) system that has the ability to automatically design an optimal algorithm. Out of the 185 possible qualities, they selected 13 features to train the machine learning models. To differentiate between cancerous and benign tumours, they applied five different machine learning classifiers. The results of the experiment showed that employing a tree-structured Parzen estimator with a machine learning classifier in Bayesian optimization for tenfold cross-validation yielded promising outcomes. Light GBM emerged as the top performer among the five classifiers, achieving an accuracy of 99.86%, precision of 100.0%, recall of 99.60%, and FI score of 99.80% [70].

A novel approach for classifying and segmenting skin lesions using image processing and machine learning was put forth by Javaid et al. For image segmentation, they employed contrast stretching and OTSU thresholding, and they retrieved features including GLCM, HOG, and colour identification. They used SMOTE sampling to address class imbalance and PCA to reduce dimensionality. They carried out feature selection and employed Random Forest, SVM, and Quadratic Discriminant classifiers for classification. For the ISIC-ISBI 2016 dataset, their suggested approach had an accuracy of 93.89% [106].

David A. Omondiagbe and colleagues developed a combination technique to identify breast cancer by lowering the complexity. The proposed method involved utilizing linear discriminant analysis (LDA) to reduce the feature set, followed by implementing Support Vector Machine using the reduced features. In terms of performance, the approach achieved an accuracy of 98.82%, sensitivity of 98.41%, specificity of 99.07%, and an area under the receiver operating characteristic curve of 0.9994 [107].

A novel technique for choosing genes from gene expression data was created by Guyon et al. utilizing Support Vector Machine techniques with Recursive Feature Elimination. In contrast to other approaches, their research demonstrated that the chosen genes produced higher classification performance and more condensed gene subsets [10].

Dr. Shahin Ali and associates developed a deep convolutional neural network (DCNN) to reliably differentiating normal from cancerous tumours skin lesions. During the preprocessing stage, the input images are filtered to remove noise and artefacts, normalised, and feature extraction is carried out to aid in correct classification. To enhance the quantity of images and improve categorization accuracy, data augmentation techniques are employed. The performance of the DCNN model is compared to several transfer learning models such as AlexNet, ResNet, VGG-16, DenseNet, and MobileNet. The evaluation of the model’s effectiveness is conducted using the HAM10000 dataset. The training accuracy of the model was determined to be 93.16%, while the testing accuracy reached 91.93% [89].

The goal of the study conducted by Nurul Amirah Mashudi et al. was to assess how well various machine learning algorithms performed for classifying benign and malignant breast cancers, including Support Vector Machine (SVM), Random Forest, and k-Nearest Neighbours (k-NN). The scientists also used ensemble methods and tenfold cross-validation to forecast breast cancer survival. The recommended methodologies were further tried utilising twofold, threefold, and fivefold cross-validation in order to get the highest accuracy rate practical was 91.93%. An accuracy rate of 70% was observed by the research [108].

The technique proposed by Khalil Maalmi et al. consists of two parts. Firstly, Association Rules are employed to eliminate unnecessary parts (AR). Second, a number of classifiers are used to differentiate entering tumours with AR, the feature space is divided into eight and four qualities instead of nine. Using the Wisconsin Breast Cancer Diagnostic (WBCD) dataset from the University of California Irvine machine learning repository, the performance of the proposed system is assessed during the test phase using a threefold cross-validation technique. The greatest classification accuracy for the Support Vector Machine (SVM) model with AR was 98.00% for eight features and 96.14% for four attributes [109].

An innovative method for classifying breast cancer dubbed DLXGB, developed by Xin Yu Liew, analyses histopathology pictures of breast cancer from the BreaKH is dataset using Deep Learning and extreme Gradient Boosting algorithms. Using pre-processing methods including data augmentation and stain normalization, a pre-trained DenseNet201 is used to learn image characteristics, which are then combined with a potent gradient boosting classifier. Adenosis, Fibroadenoma, Phyllodes Tumor, Tubular Adenoma, Ductal Carcinoma, Lobular Carcinoma, Mucinous Carcinoma, and Papillary Carcinoma are among the eight non-overlapping/overlapping categories that will be used to classify breast cancer histology images in addition to binary benign and malignant categories. Using the BreaKHis dataset, the suggested DLXGB approach outperformed previous studies with an accuracy of 97% for binary and multi-classification [110].

Support Vector Machine (SVM), Logistic Regression (LR), and Neural Network (NN) were three machine learning approaches used in a study by Kristoffersen et al. To differentiate between benign and malignant breast cancer, a research study was conducted using the breast cancer Wisconsin diagnostic (BCWD) dataset. The study aimed to evaluate various machine learning techniques by testing multiple models, each with its own unique parameter values. The performance of these models was assessed using a confusion matrix and k-fold cross-validation. The results obtained through the k-fold cross-validation method revealed that Support Vector Machine (SVM) outperformed Logistic Regression (LR) and Neural Networks (NN) in terms of classification accuracy, precision, recall, and specificity. However, when using train-test split validation, the Neural Network model achieved the highest accuracy at 99.4%, surpassing both SVM and LR [111].

The output layer of two novel hybrid CNN models developed by Duggani Keerthana and her associates uses an SVM classifier to categorise dermoscopy pictures as benign or malignant lesions. The suggested model utilizes two Convolutional Neural Network (CNN) models to extract characteristics of the data. The extracted features are then concatenated and fed into an SVM classifier for classification. To assess this model’s performance, the predicted outcomes are compared to the labels assigned by a dermatological expert. This allows for an assessment of how well the model performs in classifying the data accurately [112].

Kosmia Loizidou et al. conducted a comprehensive study that reviewed recent research on the automated detection and/or classification of breast cancer in mammograms. Their study covered both traditional feature-based machine learning methods as well as deep learning approaches. The paper contrasts algorithms designed to identify and/or categorise microcalcifications and masses, two different forms of breast abnormalities, and utilizing sequential mammograms has been explored as a means to enhance the performance. The authors also discuss open access mammography datasets and show various FDA-approved CAD systems for the triage and detection of breast cancer in mammograms. The study closes by pointing up potential avenues for further research in this area. This comprehensive overview might serve as a field introduction and provide direction for upcoming research applications [113].

The classification of lung CT scans has undergone a new method of development by Ebtasam Ahmad Siddiqui et al. Their approach combines the use of Gabor filters in conjunction with an enhanced Deep Belief Network (E-DBN) that incorporates numerous categorization techniques. The Gaussian-Bernoulli (GB) and Bernoulli-Bernoulli (BB) RBMs make up the E-two DBN’s cascaded RBMs. The authors were able to acquire the best performance parameters out of all the applicable approaches by using a support vector machine (SVM). The suggested model combines an SVM with an E-DBN to improve lung CT image classification’s precision, sensitivity, specificity, F-1 score, false positive rate (FPR), false negative rate (FNR), and ROC curve. Three publicly accessible datasets, including the LUNA-16 and LIDC-IDRI datasets, were used to test and assess the suggested technique [114].

In order to diagnose breast cancer, Md. Mehedi Hassan et al. compare multiple machine learning models using various categorization techniques. They use techniques like correlation matrices, histograms, and data distribution for systematic data collection, preparation, transformation, and exploratory analysis. To determine the most crucial characteristics, they also use the Least Absolute Shrinkage and Selection Operator (LASSO) technique. For evaluation and analysis, the study uses the techniques Logistic Regression, K-Nearest Neighbours, Extreme Gradient Boosting, Gradient Boosting, Random Forest, Multilayer Perceptron, and Support Vector Machine. Notably, their findings demonstrate that Random Forest outperforms the LASSO method with a maximum accuracy of 90.68%. Furthermore, K-Nearest Neighbors achieves a recall of 98.80%, Multilayer Perceptron exhibits a precision of 92.50%, and Random Forest attains an F1 score of 94.60% [115].

For categorization objectives, Abdullah-Al Nahid et al. apply novel deep neural network (DNN) approaches led by structural and statistical data from biological breast cancer images (BreakHis dataset). To categorise the breast cancer photos, they suggest using a Convolutional Neural Network (CNN), a Long-Short-Term Memory (LSTM), or a mix of CNN and LSTM. Following the feature extraction step utilising the suggested DNN models, the decision-making stage makes use of Softmax and Support Vector Machine (SVM) layers. The experimental findings demonstrate the best precision of 96.00% on the 40 × dataset, the best F-Measure on both the 40 × and 100 × datasets, and the greatest accuracy of 91.00% on the 200 × dataset [116].

In a study, Mehedi Masud and associates analysed histopathological images to create a classification system that could discriminate between two benign and three malignant forms of lung and colon tissues, making a total of five different types. Their suggested framework produced encouraging outcomes, with the ability to identify malignant tissues with a maximum accuracy of 96.33%. The results indicate that this model may be used as an automated and trustworthy method for medical professionals to correctly diagnose various forms of colon and lung cancer [117].

From the literature survey we conclude that Determining the “best” machine learning algorithm for cancer classification depends on several factors like the specific characteristics of the dataset, the size of the dataset, the nature of the cancer classification problem, and the desired performance metrics. There is no one-size-fits-all answer as different algorithms may perform differently in various scenarios. However, some commonly regarded powerful algorithms for cancer classification tasks include:

  1. 1.

    Support Vector Machines (SVM): Capable of handling both linear and non-linear classification tasks, and effective for processing high-dimensional data.

  2. 2.

    Random Forests: Robust and versatile, multiple decision trees are combined in the ensemble learning method for improved accuracy.

  3. 3.

    Gradient Boosting methods (e.g., XGBoost, AdaBoost): Can effectively handle imbalanced datasets and often achieve high predictive performance.

  4. 4.

    Deep Learning models (e.g., Convolutional Neural Networks): Particularly useful for image-based cancer classification tasks, leveraging complex patterns and features (Table 4).

Table 4 Comparative Study Table for the Comparison of Different Machine-Learning Techniques [118,119,120,121,122]

5 Discussion on Machine Learning

Overview of the Findings: The comprehensive study found that machine learning algorithms had promising results for classifying cancer. Numerous machine learning techniques, such as Support Vector Machines (SVM), Artificial Neural Networks (ANN), Random Forest, Decision Trees, and Deep Learning, have been used to categorise various types of cancer [14, 123,124,125,126].

The results of the study indicate that these techniques have high accuracy and can effectively differentiate between cancerous and non-cancerous tissues. Additionally, these techniques can also identify different subtypes of cancer and predict the likelihood of cancer recurrence. The potential clinical applications of these techniques are also noteworthy. Machine learning algorithms can help in early cancer diagnosis, patient stratification, personalized treatment planning, and monitoring of cancer progression. These techniques can also aid in drug discovery and development. However, there are some limitations to the use of these techniques in cancer classification, such is the requirement for extensive and varied datasets, potential bias in data selection, and the interpretability of the models.

Overall, the systematic review suggests that machine learning techniques have great potential in cancer classification and can significantly enhance the detection and management of cancer.

5.1 Significance of the Findings

The findings of the systematic review have significant implications for cancer diagnosis, prognosis, and treatment. The application of machine learning techniques to the classification of cancer can lead to more precise and effective diagnosis as well as better patient outcomes. The ability of machine learning to analyze massive volumes of data rapidly and accurately is a key advantage it has over conventional approaches [127]. This can help in identifying subtle differences in cancer types, subtypes, and stages, which can be missed by human experts. Additionally, machine learning can also help in identifying potential biomarkers and drug targets [128], which can lead to the development of new and more effective cancer therapies. The use of machine learning in cancer classification can also improve patient stratification, allowing clinicians to provide personalized treatment plans based on individual patient characteristics. This can help in reducing treatment-related adverse effects and improving treatment efficacy. However, there are some limitations and challenges to implementing these techniques in clinical practice. One significant limitation is the need for large and diverse datasets to train the models accurately [19]. Additionally, there is a potential risk of bias in data selection, which can affect the performance of the models. There is also a need to address the interpretability of the models to ensure that clinicians can understand the reasoning behind the model’s predictions.

Despite these difficulties, there are substantial potential advantages to employing machine learning for cancer categorization. The systematic review’s conclusions emphasize the need for more study and advancement in this field in order to fully realize machine learning’s promise for cancer diagnosis, prognosis, and therapy.

5.2 Comparison with Previous Studies

Previous studies have also investigated the use of machine learning techniques in cancer classification, and the results of the present investigation agree with those from these studies. The excellent accuracy of machine learning approaches in the categorization of cancer has been documented in a number of research. For instance, According to a study by Esteva et al. a deep-learning system can accurately identify skin cancer in photographs with performance that is on par with or superior to that of board-certified dermatologists [129]. Similarly, another study by S Cui et al. reported that a deep learning algorithms could effectively classify lung nodules on CT images [130]. The results of the current study, which also show the great accuracy of machine learning techniques in cancer classification, are consistent with these findings. However, there are also some discrepancies between the current findings and previous studies. For example, some studies have reported lower accuracy rates for machine learning algorithms in cancer classification. These discrepancies could be due to differences in the datasets used, the machine learning techniques employed, or the specific cancer types being classified.

Overall, the current study’s findings are consistent with previous studies, which have shown the use of machine learning methods in the categorization of cancer. To address some of the restrictions and difficulties related to the application of these approaches in clinical practise, more study is necessary.

Some of the limitations which we addressed from the literature review are:

  1. 1.

    Availability and quality of labeled data: The limited availability of large-scale, well-annotated datasets for training machine learning models in cancer classification poses a challenge. We will highlight the importance of access to improve the models’ precision and generalizability, use a range of different and representative datasets.

  2. 2.

    Class imbalance: Imbalanced class distributions in cancer datasets can affect the performance of machine learning algorithms since they frequently favour the dominant class. To solve this issue and enhance the categorization of minority classes, we will talk about strategies like oversampling, under sampling, and cost-sensitive learning.

  3. 3.

    Feature selection and dimensionality: Cancer datasets often contain many features, some of which can be unnecessary or superfluous. To increase efficiency, we will investigate feature selection strategies and dimensionality reduction techniques and effectiveness of cancer classification models.

  4. 4.

    Interpretability and transparency: Machine learning models used for cancer classification often exhibit a black-box nature, making it challenging to interpret the underlying reasoning behind their predictions. We will discuss the importance of model interpretability and potential methods, such as feature importance analysis and model-agnostic interpretability techniques, to improve transparency and trust in the decision-making process.

In addition to discussing these issues, some of the objectives for improvising cancer classification, which may include:

  • Developing novel machine learning algorithms specifically tailored for cancer classification to improve accuracy and interpretability.

  • Integrating multimodal data sources, such as genomics, imaging, and clinical data, to enhance the predictive power of machine learning models.

  • Exploring ensemble learning techniques and model combination approaches to leverage Synthesize the advantages of multiple approaches and augment overall effectiveness.

  • Investigating the potential of Utilize deep learning models like convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to effectively capture intricate patterns and temporal relationships, resulting in improved precision when classifying cancer.

  • Conducting further research on transfer learning and domain adaptation to address the challenges of limited labeled data in specific cancer types or subtypes.

By addressing these issues and stating the objectives for improvement, the discussion section will provide insights into the potential future directions and advancements in Utilize machine learning techniques to classify cancer accurately.

5.2.1 Contribution of the paper

  1. 1.

    Comprehensive review of the applications and techniques of machine learning in cancer classification.

  2. 2.

    Presentation of actual use cases of machine learning in cancer classification, demonstrating their implementation on medical data.

  3. 3.

    Discussion on supervised, unsupervised, and reinforcement learning algorithms, highlighting their advantages and disadvantages in the context of cancer classification.

  4. 4.

    Exploration of the implications and future potential of machine learning in improving cancer diagnosis, patient outcome prediction, and identification of therapeutic targets.

By organizing the contributions in bullet points, we aim to provide a clear and concise overview of the key contributions of the paper.

6 Conclusion

In conclusion, the systematic review has highlighted the promise of machine learning methods for identifying cancer. The findings suggest that these techniques have high accuracy rates and can be used to classify various cancer types, potentially aiding in diagnosis, prognosis, and treatment planning. The significance of the findings lies in the potential impact on improving cancer care, with the ability to provide faster and more accurate diagnoses, improved treatment planning, and personalized treatment strategies. The review has also identified limitations and challenges associated with the implementation of these techniques in clinical practice, including data standardization, interpretability, and ethical considerations. The study’s contribution to the field of machine learning in cancer classification is in identifying key areas of focus for future research, examining, for instance, how well machine learning models function in actual healthcare contexts, improving interpretability, and evaluating the potential of machine learning in combination with other diagnostic and treatment modalities.

Overall, the systematic review demonstrates the tremendous promise of machine learning methods for cancer classification, and more study in this field may lead to better cancer treatment and patient outcomes.

Implications for Future Research: The findings of the systematic review have several implications for future research in the field of machine learning and cancer classification.

Firstly, there is a need for more standardized approaches to data collection and analysis. This can help in ensuring that the datasets used in machine learning studies are diverse, representative, and unbiased. Additionally, there is a need to address the issue of data privacy and security to ensure that patient data is protected while still being accessible for research purposes.

Secondly, there is a need to investigate the interpretability of machine learning models in cancer classification. This can help in improving the transparency of the models and ensuring that clinicians can understand the reasoning behind the model’s predictions.

Thirdly, the effectiveness of machine learning models in actual healthcare situations has to be examined. This can help in evaluating the feasibility and clinical applicability of these techniques and identifying any potential barriers to their implementation in clinical practice.

Fourthly, there is a need to investigate the potential of machine learning techniques in combination with other diagnostic and treatment modalities. For example, combining machine learning with imaging or genomic data can provide more accurate and comprehensive cancer diagnosis and treatment planning.

Finally, there is a need to investigate the ethical implications of using machine learning in cancer classification. This can help in ensuring that the use of these techniques is consistent with ethical principles and patient rights.

In conclusion, the findings of the systematic review highlight the potential of machine learning methods in cancer classification and the need for further research to address the limitations and challenges associated with their implementation in clinical practice.