Abstract
Background
Increasing data volumes in oncology pose new challenges for data analysis. Machine learning, a branch of artificial intelligence, can identify patterns even in very large and less structured datasets.
Objective
This article provides an overview of the possible applications for machine learning in oncology. Furthermore, the potential of machine learning in patient-reported outcome (PRO) research is discussed.
Materials and methods
We conducted a selective literature search (PubMed, MEDLINE, IEEE Xplore) and discuss current research.
Results
There are three primary applications for machine learning in oncology: (1) cancer detection or classification; (2) overall survival prediction or risk assessment; and (3) supporting therapy decision-making and prediction of treatment response. Generally, machine learning approaches in oncology PRO research are scarce and few studies integrate PRO data into machine learning models.
Discussion
Machine learning is a promising area of oncology, but few models have been transferred into clinical practice. The promise of personalized cancer therapy and shared decision-making through machine learning has yet to be realized. As an equally important emerging research area in oncology, PROs should also be incorporated into machine learning approaches. To gather the data necessary for this, broad implementation of PRO assessments in clinical practice, as well as the harmonization of existing datasets, is suggested.
Zusammenfassung
Hintergrund
Steigende Datenmengen in der Onkologie stellen neue Herausforderungen an die Analyse. Machine Learning ist ein Teilbereich der künstlichen Intelligenz und kann auch in sehr großen und weniger strukturierten Datensätzen Zusammenhänge erkennen.
Ziel der Arbeit
Der Artikel vermittelt einen Überblick zu den Einsatzbereichen von Machine Learning in der Onkologie. Weiterhin wird das Potenzial von Machine Learning für die Patient-Reported-Outcome (PRO) Forschung diskutiert.
Material und Methoden
Selektive Literaturrecherche (PubMed, MEDLINE, IEEE Xplore) und Diskussion des aktuellen Stands der Forschung.
Ergebnisse
In der Onkologie ergeben sich drei primäre Einsatzbereiche für Machine Learning: (1) zur Krebserkennung oder Klassifikation bei bildgebenden Verfahren, (2) zur Prognose von Gesamtüberleben oder zur Risikoeinschätzung, (3) zur Unterstützung bei Behandlungsentscheidungen und zur Vorhersage von Therapieansprechen. In der onkologischen PRO-Forschung und Praxis werden bisher kaum Machine-Learning-Ansätze verfolgt und es gibt nur wenige Studien, welche PRO-Daten in Machine-Learning-Modelle integrieren.
Diskussion
Machine Learning zeigt in einigen Bereichen der Onkologie vielversprechende Anwendungsmöglichkeiten, jedoch schaffen wenige Modelle den Sprung in die klinische Praxis. Die Versprechen von einer personalisierten Krebstherapie und von Unterstützung bei der Behandlungsentscheidung durch Machine Learning haben sich noch nicht erfüllt. Als ein Bereich, der in der Onkologie stetig an Bedeutung gewinnt, sollten PRO auch in Machine-Learning-Ansätze aufgenommen werden. Dazu sind jedoch die breite, standardisierte Erfassung von PRO sowie die umfassende Harmonisierung bestehender Datensätze nötig.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
The growing amount of data being accumulated in the field of oncology offer manifold opportunities to deepen our understanding of cancer, but at the same time pose new challenges for data processing and analysis. Machine learning, a field of artificial intelligence, can help extract meaningful information from large amounts of data. Due to the increasing importance and use of machine learning in oncology, a basic understanding of the technology will become relevant to practising oncologists in the near future.
Background
Over the last two decades, the amount of data in the field of oncology has increased rapidly. For each individual patient, more and more data are being generated and stored. For example, in the European Innovative Medicines Initiative project OncoTrack, close to 1 terabyte (1,000 gigabytes) of data are produced per patient, which is equivalent to 250,000 photos or 6.5 million document pages [14]. This increase in stored information was sparked and promoted by the digitalization of medicine and technological advances, such as genome sequencing.
Such high volumes of data offer a high potential to deepen our knowledge of cancer, but at the same time place new demands on our methods of data processing and evaluation. The limitations of classical (inferential) statistical methods, as are traditionally used in medicine, quickly become apparent when faced with increasing amounts of data and variables. Machine learning, a branch of artificial intelligence [7], is one approach for processing large amounts of data and identifying patterns or variables of interest. There are different applications of machine learning in oncology. For example, machine learning can be used to detect tumors in images, e.g., computed tomography (CT) or functional magnetic resonance imaging (fMRI) or to estimate the risk of cancer progression based on clinical variables.
In this article, we provide a brief overview of machine learning applications in oncology. To this end, we first explain what machine learning is and what sets it apart from the traditional statistical methods currently used in medicine. We review different applications of machine learning in oncology and discuss the current state of research. Finally, we provide an outlook on how machine learning can be used in the field of patient-reported outcome (PRO) research in oncology and which developments are prerequisites for advancing research on this topic.
What are artificial intelligence, machine learning and deep learning?
The term “machine learning” refers to methods that allow a computer system to draw conclusions from data in order to improve its capabilities in a specific task in the long term. Machine learning is only one part of artificial intelligence, which also includes areas such as planning, problem-solving and logical reasoning. A machine learning model can “learn” to predict as yet unknown data or outcomes from the data it is given and thus can try to “understand” a specific problem or question. For this, the nature of the data is of particular importance. In general, learning is differentiated between supervised, unsupervised and reinforcement learning. In supervised learning, each data point is associated with a concrete expected prediction result (e.g. benign tumor). The goal during the learning process is to approximate a function that maps the data to expected outputs. In contrast, unsupervised learning aims to detect patterns in the available data without relying on feedback or prediction. For example, images can be grouped by similarity. In reinforcement learning, the performance of the algorithm is continuously evaluated, and rewarded or punished to guide the learning process in the direction of the desired behavior [22].
A commonly used tool in machine learning is artificial neural networks. These networks are composed of simple neurons that, similar to the human brain, “fire” based on their interconnectedness and communicate with their linked neurons. This enables the networks to recognize and map complicated relationships. The usually large amounts of data processed in machine learning require complex neural networks composed of several layers of neurons. Methods based on such networks are generally referred to as “deep learning” and are considered a subcategory of machine learning [25]. Fig. 1 gives an overview of the terms.
Machine learning can be used for both inference and prediction and thus has some overlap with classical statistical methods. Some methods used in statistics, such as linear and logistic regression, are also applied in machine learning; however, the two approaches usually differ in their focus. While statistical methods aim at transferring properties of a sample to a population with a certain confidence, machine learning focuses on predicting and recognizing patterns in data. The latter usually involves processing large amounts of data without making specific assumptions about the nature of the data. Deep learning in particular allows the mapping of complex, non-linear relationships due to the large amounts of data and high number of parameters employed in the models; however, such models also suffer from an understandably lower interpretability of the results [7].
Applications of machine learning in oncology
Machine learning is already being used in various areas of oncology. Following the premise that more data usually lead to better models, machine learning is mainly used in areas that produce large amounts of data which are as standardized as possible. The most common applications for machine learning in oncology are presented below.
Machine learning for diagnosis and screening
One of the main applications for machine learning methods is for the diagnosis and screening of cancer. Comprising both radiology and pathology, the majority of machine learning research in oncology has been conducted in this area [8, 23]. Machine learning can be used to detect malignant tissue via imaging techniques and by training models to recognize cancer in images. For example, Tamashiro et al. were able to successfully use convolutional neural networks (a neural network specialized in image processing) to detect oropharyngeal carcinomas on endoscopy images [27]. Their study not only shows how accurately such an algorithm can work (all carcinomas in the validation dataset were correctly detected), but also showcases the time efficiency of using such a model: all 1912 images of individual patients were correctly classified in under 30 s.
In another study by Google Health researchers, neural networks were used to detect cancer on mammography images—a procedure that still suffers from a high rate of false positives [17]. Compared to six radiologists’ ratings, the algorithm showed a superior malignant tissue detection rate and a significantly lower rate of false positives.
A recent systematic review of studies that compared human and artificial intelligence judgement concluded that algorithms can achieve similar or better detection of cancer in images compared to human experts [19]; however, the review also showed that external validation of the algorithms is often lacking (i.e., validation of the data on an independent dataset), which limits the generalizability of the models.
Machine learning for forecasting and risk assessment
The prognostic indices classically used for prognosis in oncology are often based on only a handful of medical parameters. In contrast, machine learning models can handle less structured data including many more parameters to improve their prognostic power. Depending on the data quality and quantity, different machine learning algorithms are suitable.
Elfiky et al. [10] developed a machine learning model to predict overall survival at the start of chemotherapy. They used a sample of N = 26,946 patients who started 51,774 different chemotherapies. Using decision trees, they successfully identified those patients who were at particularly high risk of dying within the next 30 days. This allowed the authors to identify patients who were still receiving (curative) chemotherapy despite a very poor prognosis. The authors concluded that machine learning algorithms could be used in the future to recommend palliative treatment for patients with poor prognosis in order to avoid unnecessary treatment burden.
In another study, researchers were able to predict survival in N = 221 patients with nonresectable pancreatic cancer using machine learning algorithms [28]. They used data from 168 patients to train the model and then validated it on data from 53 additional patients. By comparing different prediction models, the authors were able to show that machine learning had a significantly higher predictive power compared to traditional methods such as logistic regression.
While such studies showcase the impressive achievements of machine learning, they also illustrate a more general problem: most of the developed models are very specific since they are based on data from particular diseases or patient groups.
Machine learning for predicting treatment response and supporting treatment decisions
Although personalized cancer care is becoming increasingly more important, the respective treatment approaches are often not individually tailored to the patients. Machine learning can be used to pursue and promote personalized approaches in therapy selection or adaptation. Agius et al. [1] describe the development of a composite machine learning model (the model combines 28 separate models) that identified newly diagnosed patients with chronic lymphocytic leukemia who were at particularly high risk for infections. For this patient group, infections are the most common cause of death. Using data from N = 4,149 patients, the researchers created a model based on measured clinical parameters and individual risk factors. Their model, which can be viewed online (www.CLL-TIM.org), allowed them to initiate treatment at an early stage for patients who had a high risk for infections.
Machine learning can also be used to predict treatment response. For example, Hou et al. developed a model to predict the treatment response to chemoradiotherapy based on a retrospective dataset [12]. In other studies, machine learning has been used to personalize the treatment dosage or to predict adverse events and incorporate this information into the treatment trajectory [20, 21].
The potential of machine learning for patient-reported outcome research
Due to steady improvements in the early detection and treatment of cancer, overall survival has significantly increased in recent decades [16]. Depending on the diagnosis and stage of the disease, patients with cancer now have a significantly longer life expectancy than a few decades ago. Consequently, it has become increasingly relevant how patients live with the disease and how they are affected by the disease or treatment. A study by the German Cancer Research Center showed that patients with (colorectal) cancer report at least one tumor-associated or therapy-associated symptom 10 years after initial diagnosis [15]. Information on the expected quality of life during treatment is often desired by patients but is also often undercommunicated and underdiscussed during consultations with clinicians [18]; however, precisely such information can help reduce uncertainty in patients and promote healthy coping and desirable health behavior [13]. Consequently, for treatment decisions or decisions about follow-up care, information on quality of life should supplement the clinical decision-making process.
The gold standard for assessing (health-related) quality of life is PROs, which are defined as all statements about a patient’s own health status or treatment that come directly from the patient and are not interpreted by third parties (e.g., clinicians) [11]. The PROs are one of several ways to measure clinical outcomes or changes (so-called clinical outcome assessments or COAs). They can be used in clinical practice, clinical trials, and registry research, or for health technology assessments and quality assurance.
PROs assess the patient perspective on their own health status
Typically, PROs are collected in the form of standardized questionnaires to allow comparisons between patients. In this form, PRO data are a source suitable for machine learning algorithms: each scale of a questionnaire, each question, or even each individual answer can be considered a unique variable with potential predictive value. For traditional prediction models, the large number of possible predictors as well as their combinations and (non-linear) interactions, poses a problem. For machine learning models, on the other hand, a large pool of data is a potential advantage. In such scenarios, models developed with machine learning and especially deep learning techniques offer significantly higher predictive power and accuracy than what can be achieved with traditional statistics [6, 29].
The PROs offer information about the patients’ health status that can be an important supplement to “hard” clinical data. The obvious application of machine learning in the context of PRO research is to predict overall patient survival. In fact, there are a number of studies that show that PROs, in addition to medical parameters, can add incremental variance to the prediction of overall survival in oncology (for a review, see Efficace et al. [9]); however, such research currently does not involve machine learning methods. The second important potential application of machine learning methods for PRO research lies in the prediction of expected quality of life. In the future, individualized predictions of quality of life during and after the treatment could become an important component of shared decision-making and patient empowerment.
Machine learning research using PROs: little research as of yet
Despite the previously mentioned advantages, there are hardly any studies in oncology research that integrate PROs into machine learning models. We present two studies that show the potential of PROs and machine learning in oncology. Firstly, Arkin et al. [2] used neural networks to predict the overall survival of palliative patients with cancer. In addition to clinical variables, the authors used a PRO instrument, the Edmonton Symptom Assessment Scale (ESAS), as a predictor of survival. In their study, they were able to show that the neural network had a higher predictive power than a comparable logistic regression and that the ESAS was the third most highly correlated variable (r = 0.32) with overall survival.
Secondly, Santos et al. [24] presented a study that combined PROs and machine learning models. In their study, the authors investigated how machine learning can be used to support complex decisions in critical care. They collected quality of life data from N = 777 patients with cancer, following their admission to the intensive care unit. Using different machine learning algorithms based on 37 clinical variables at admission, they predicted the patients’ quality of life-adjusted 30-day survival (meaning that survival time is corrected by the expected quality of life). The models developed by the authors thus not only predicted how long patients survive but also included information about their health status after intensive care.
Barriers to apply machine learning to PRO data are data quantity, data quality and standardization
These two studies show that PROs can be used both as a predictor and a criterion in machine learning models; however, there are only a few studies in the literature that combine machine learning methods with PRO data. In a systematic review of machine learning approaches in palliative care research, a field in which PROs should play an important role, no studies were found that included PROs [26].
Why are PROs seldom integrated into machine learning approaches?
Both machine learning and PROs are still comparatively new research areas which, until recently, did not have significant overlap. The nature of data and datasets used in PRO research can partially explain why PROs are still rarely integrated into machine learning approaches. Firstly, structured PROs are rarely collected during routine clinical care. Other kinds of clinical data, such as CT images, are collected more frequently and in a more standardized manner. Thus, while studies with CT images are possible using data from routine care, PRO data usually have to be collected with additional effort, for example, in clinical trials. As a result, PRO datasets are often comparatively small, which drastically reduces possible applications and advantages of machine learning. A second problem is that there are many different PRO instruments, which hinders harmonization of data. Quality of life data from patients collected with different questionnaires are only comparable to a very limited extent, because the questionnaires conceptualize and operationalize quality of life differently. In conclusion, the barriers to applying machine learning to PRO data in oncology lie in the quantity of data, data quality and standardization.
What developments are needed for machine learning to be used for PRO research?
In order to reap the benefits of machine learning methods in the context of PRO research, the broad and standardized collection of PRO data is needed. Projects that implement PROs in broad oncology practice already exist, for example in Canada (Ontario Cancer Registry [4, 5]). In participating oncology centers, patients complete a short symptom screening questionnaire during each visit. This implementation of PROs in routine clinical care results in larger datasets and enables analyses of 100,000–200,000 patients with PRO and clinical data [3, 4]. To date, however, no machine learning algorithms have been applied to those or any other comparable PRO datasets.
Analyses with larger datasets would be promising, because the predictive power of machine learning models increases with the amount of data and variables; however, there is also a significant amount of work involved in collecting and harmonizing PRO data. Data collection and analysis become complicated when PRO data are unstructured, when many data points are missing, or when PROs must be extracted from diverse clinical information systems. Consequently, initiatives that harmonize the collection of PRO data as clinical endpoints, or encourage their collection in clinical practice, are called for.
Conclusions for clinical practice
Machine learning can help to analyze the increasing amounts of data in oncology and can provide new insights into cancer. If the models are well developed and validated, they can support clinical work in the future, for example in cancer screening or by suggesting treatment options; however, current machine learning models are often too specialized and not sufficiently validated to be widely applicable.
For PRO research, machine learning analyses may be a promising and largely unexplored approach. With increasing quality and harmonization of larger datasets, a combination of PROs and clinical data should improve the power of prediction models for overall survival. As another application of machine learning, the individual prediction of expected quality of life, based on clinical parameters and PROs, could significantly enrich clinical practice.
References
Agius R, Brieghel C, Andersen MA et al (2020) Machine learning can identify newly diagnosed patients with CLL at high risk of infection. Nat Commun 11:363. https://doi.org/10.1038/s41467-019-14225-8
Arkin FS, Aras G, Dogu E (2020) Comparison of artificial neural networks and logistic regression for 30-days survival prediction of cancer patients. Acta Inform Med 28:108–113. https://doi.org/10.5455/aim.2020.28.108-113
Barbera L, Sutradhar R, Earle CC et al (2020) The impact of routine Edmonton symptom assessment system use on receiving palliative care services: results of a population-based retrospective-matched cohort analysis. BMJ Support Palliat Care. https://doi.org/10.1136/bmjspcare-2020-002220
Barbera L, Sutradhar R, Seow H et al (2020) Impact of standardized Edmonton symptom assessment system use on emergency department visits and hospitalization: results of a population-based retrospective matched cohort analysis. JCO Oncol Pract. https://doi.org/10.1200/JOP.19.00660
Basch E, Barbera L, Kerrigan CL, Velikova G (2018) Implementation of patient-reported outcomes in routine medical care. Am Soc Clin Oncol Educ Book 38:122–134. https://doi.org/10.1200/EDBK_200383
Bice N, Kirby N, Bahr T et al (2020) Deep learning-based survival analysis for brain metastasis patients with the national cancer database. J Appl Clin Med Phys 21:187–192. https://doi.org/10.1002/acm2.12995
Bzdok D, Altman N, Krzywinski M (2018) Statistics versus machine learning. Nat Methods 15:233–234. https://doi.org/10.1038/nmeth.4642
Echle A, Rindtorff NT, Brinker TJ et al (2020) Deep learning in cancer pathology: a new generation of clinical biomarkers. Br J Cancer. https://doi.org/10.1038/s41416-020-01122-x
Efficace F, Collins GS, Cottone F et al (2021) Patient-reported outcomes as independent prognostic factors for survival in oncology: systematic review and meta-analysis. Value Health 24(2):250–267. https://doi.org/10.1016/j.jval.2020.10.017
Elfiky AA, Pany MJ, Parikh RB, Obermeyer Z (2018) Development and application of a machine learning approach to assess short-term mortality risk among patients with cancer starting chemotherapy. JAMA Netw Open 1:e180926. https://doi.org/10.1001/jamanetworkopen.2018.0926
Food and Drug Administration (2009) Patient-reported outcome measures: use in medical product development to support labeling claims. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/patient-reported-outcome-measures-use-medical-product-development-support-labeling-claims. Zugegriffen: 5. Aug. 2020
Hou Z, Ren W, Li S et al (2017) Radiomic analysis in contrast-enhanced CT: predict treatment response to chemoradiotherapy in esophageal carcinoma. Oncotarget 8:104444–104454. https://doi.org/10.18632/oncotarget.22304
Husson O, Mols F, van de Poll-Franse LV (2011) The relation between information provision and health-related quality of life, anxiety and depression among cancer survivors: a systematic review. Ann Oncol 22:761–772. https://doi.org/10.1093/annonc/mdq413
Innovative Medicines Initiative (2021) Onco Track. Methods for systematic next generation oncology biomarker development. https://www.imi.europa.eu/projects-results/project-factsheets/onco-track. Zugegriffen: 13. Jan. 2021
Jansen L, Herrmann A, Stegmaier C et al (2011) Health-related quality of life during the 10 years after diagnosis of colorectal cancer: a population-based study. J Clin Oncol 29:3263–3269. https://doi.org/10.1200/JCO.2010.31.4013
Jemal A, Ward EM, Johnson CJ et al (2017) Annual report to the nation on the status of cancer, 1975–2014, featuring survival. J Natl Cancer Inst. https://doi.org/10.1093/jnci/djx030
McKinney SM, Sieniek M, Godbole V et al (2020) International evaluation of an AI system for breast cancer screening. Nature 577:89–94. https://doi.org/10.1038/s41586-019-1799-6
McRoy S, Rastegar-Mojarad M, Wang Y et al (2018) Assessing unmet information needs of breast cancer survivors: exploratory study of online health forums using text classification and retrieval. JMIR Cancer 4:e10. https://doi.org/10.2196/cancer.9050
Nagendran M, Chen Y, Lovejoy CA et al (2020) Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. https://doi.org/10.1136/bmj.m689
Nguyen D, Jia X, Sher D et al (2019) 3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U‑net deep learning architecture. Phys Med Biol 64:65020. https://doi.org/10.1088/1361-6560/ab039b
Nguyen D, Long T, Jia X et al (2019) A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning. Sci Rep 9:1076. https://doi.org/10.1038/s41598-018-37741-x
Russell SJ, Norvig P, Davis E, Edwards D (2016) Artificial intelligence: a modern approach, 3 edn. Pearson, London
Saba L, Biswas M, Kuppili V et al (2019) The present and future of deep learning in radiology. Eur J Radiol 114:14–24. https://doi.org/10.1016/j.ejrad.2019.02.038
Santos HGD, Zampieri FG, Normilio-Silva K et al (2020) Machine learning to predict 30-day quality-adjusted survival in critically ill patients with cancer. J Crit Care 55:73–78. https://doi.org/10.1016/j.jcrc.2019.10.015
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003
Storick V, O’Herlihy A, Abdelhafeez S et al (2019) Improving palliative and end-of-life care with machine learning and routine data: a rapid review. HRB Open Res 2:13. https://doi.org/10.12688/hrbopenres.12923.2
Tamashiro A, Yoshio T, Ishiyama A et al (2020) Artificial intelligence-based detection of pharyngeal cancer using convolutional neural networks. Dig Endosc 32:1057–1065. https://doi.org/10.1111/den.13653
Tong Z, Liu Y, Ma H et al (2020) Development, validation and comparison of artificial neural network models and logistic regression models predicting survival of unresectable pancreatic cancer. Front Bioeng Biotechnol 8:196. https://doi.org/10.3389/fbioe.2020.00196
Wong NC, Lam C, Patterson L, Shayegan B (2019) Use of machine learning to predict early biochemical recurrence after robot-assisted prostatectomy. BJU Int 123:51–57. https://doi.org/10.1111/bju.14477
Funding
Open access funding provided by University of Innsbruck and Medical University of Innsbruck.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
J. Lehmann, T. Cofala, M. Tschuggnall, J.M. Giesinger, G. Rumpold and B. Holzner declare that they have no competing interests.
For this article no studies with human participants or animals were performed by any of the authors. All studies performed were in accordance with the ethical standards indicated in each case.
The supplement containing this article is not sponsored by industry.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lehmann, J., Cofala, T., Tschuggnall, M. et al. Machine learning in oncology—Perspectives in patient-reported outcome research. Onkologe 27 (Suppl 2), 150–155 (2021). https://doi.org/10.1007/s00761-021-00916-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00761-021-00916-9