Abstract
Breast cancer is a significant global health concern, with millions of cases and deaths each year. Accurate diagnosis is critical for timely treatment and medication. Machine learning techniques have shown promising results in detecting breast cancer. Previous studies have primarily used single-modality data for breast cancer diagnosis. Hence, this work aims to mobilize the benefits of multimodal data over unimodality samples. This study proposes a custom deep learning-based model pipeline that works over this multimodal data. This work has been separated into three phases. Phase 1 and Phase 2 under the unimodal category examine gene expression data and histopathological images separately. The Cancer Genome Atlas makes these datasets available. In Phase 3, the proposed pipeline operates on both data types’ samples for each patient in the multimodal category. This study investigates how data pre-processing (cleaning, transformation, reduction) and cascaded filtering affect model performance. Precision, recall, f1-score, and accuracy assessed the models, whereas L2 regularization, exponentially weighted moving average, and transfer learning minimized over-fitting. A custom deep neural network and support vector machine obtained 86% accuracy in Phase 1, whereas the VGG16 model reached 80.21% accuracy in Phase 2. In Phase 3, the curated multimodal dataset was applied to a custom deep learning pipeline (VGG16 backbone with hyper-tuned machine learning models as head classifiers) to achieve 94% accuracy, demonstrating the importance of multimodal data over unimodal in breast cancer subtype classification. These findings highlight the importance of multimodal data for breast cancer diagnosis and subtype prediction.
Similar content being viewed by others
Data availability
The dataset analyzed during this study can be found in (https://portal.gdc.cancer.gov/projects/TCGA-BRCA).
References
DeSantis CE, Bray F, Ferlay J, Lortet-Tieulent J, Anderson BO, Jemal A (2015) International variation in female breast cancer incidence and mortality rates international variation in female breast cancer rates. Cancer Epidemiol Biomark Prevent 24(10):1495–1506
Goldin A, Venditti JM, Macdonald JS, Muggia FM, Henney JE, Devita Jr VT (1981) Current results of the screening program at the division of cancer treatment, national cancer institute. Eur J Cancer (1965) 17(2), 129–142
Momenimovahed Z, Salehiniya H (2019) Epidemiological characteristics of and risk factors for breast cancer in the world. Breast Cancer Targets Ther 151–164
Siegel RL, Miller KD, Wagle NS, Jemal A (2023) Cancer statistics. CA Cancer J Clin 73(1):17–48
Arnold M, Morgan E, Rumgay H, Mafra A, Singh D, Laversanne M, Vignat J, Gralow JR, Cardoso F, Siesling S et al (2022) Current and future burden of breast cancer: global statistics for 2020 and 2040. Breast 66:15–23
Kuhl CK (2023) What the future holds for the screening, diagnosis, and treatment of breast cancer. Radiological Society of North America
Unger-Saldaña K (2014) Challenges to the early diagnosis and treatment of breast cancer in developing countries. World J Clin Oncol 5(3):465
Dileep G, Gyani SGG (2022) Artificial intelligence in breast cancer screening and diagnosis. Cureus 14(10):e30318
Nassif AB, Talib MA, Nasir Q, Afadar Y, Elgendy O (2022) Breast cancer detection using artificial intelligence techniques: a systematic literature review. Artif Intell Med 102276
Yersal O, Barutca S (2014) Biological subtypes of breast cancer: prognostic and therapeutic implications. World J Clin Oncol 5(3):412
Lipkova J, Chen RJ, Chen B, Lu MY, Barbieri M, Shao D, Vaidya AJ, Chen C, Zhuang L, Williamson DF et al (2022) Artificial intelligence for multimodal data integration in oncology. Cancer Cell 40(10):1095–1110
Kurt F, Agaoglu M, Arga KY (2022) Precision oncology: an ensembled machine learning approach to identify a candidate MRNA panel for stratification of patients with breast cancer. OMICS 26(9):504–511
Kim AA, Zaim SR, Subbian V (2020) Assessing reproducibility and veracity across machine learning techniques in biomedicine: a case study using TCGA data. Int J Med Inform 141:104148
Phan NN, Huang C-C, Tseng L-M, Chuang EY (2021) Predicting breast cancer gene expression signature by applying deep convolutional neural networks from unannotated pathological images. Front Oncol 11:769447
Xie J, Liu R, Luttrell J IV, Zhang C (2019) Deep learning based analysis of histopathological images of breast cancer. Front Genet 10:80
Liu T, Huang J, Liao T, Pu R, Liu S, Peng Y (2022) A hybrid deep learning model for predicting molecular subtypes of human breast cancer using multimodal data. Irbm 43(1):62–74
Ash JT, Darnell G, Munro D, Engelhardt BE (2021) Joint analysis of expression levels and histological images identifies genes associated with tissue morphology. Nat Commun 12(1):1609
Sun D, Li A, Tang B, Wang M (2018) Integrating genomic data and pathological images to effectively predict breast cancer clinical outcome. Comput Methods Programs Biomed 161:45–53
MacFadyen C, Duraiswamy A, Harris-Birtill D (2023) Classification of hyper-scale multimodal imaging datasets. medRxiv 2023-01
Li B, Nabavi S (2023) A multimodal graph neural network framework for cancer molecular subtype classification. arXiv preprint arXiv:2302.12838
Popovici V, Budinska E, Čápková L, Schwarz D, Dušek L, Feit J, Jaggi R (2016) Joint analysis of histopathology image features and gene expression in breast cancer. BMC Bioinform 17(1):1–9
M’Sabah CEL, Bouziane A, Ferdi Y (2021) A survey on deep learning methods for cancer diagnosis using multimodal data fusion. In: 2021 international conference on e-health and bioengineering (EHB), pp 1–4. IEEE
Hou Y (2020) Breast cancer pathological image classification based on deep learning. J Xray Sci Technol 28(4):727–738
Zhu Z, Albadawy E, Saha A, Zhang J, Harowicz MR, Mazurowski MA (2018) Breast cancer molecular subtype classification using deep features: preliminary results. In: Medical imaging 2018: computer-aided diagnosis, vol 10575. SPIE, pp 651–656
Kotsiantis SB, Zaharakis I, Pintelas P et al (2007) Supervised machine learning: a review of classification techniques. Emerging Artif Intell Appl Comput Eng 160(1):3–24
Dwivedi AK (2018) Artificial neural network model for effective cancer classification using microarray gene expression data. Neural Comput Appl 29:1545–1554
Redzuwan R, Radzi NAM, Din NM, Mustafa I (2015) Affine versus projective transformation for sift and ransac image matching methods. In: 2015 IEEE international conference on signal and image processing applications (ICSIPA). IEEE, pp 447–451
Rana R, Verma A (2014) Comparison and enhancement of digital image by using canny filter and sobel filter. IOSR J Comput Eng 16(1):06–10
Jeong W-K, Pfister H, Fatica M (2011) Medical image processing using GPU-accelerated ITK image filters. In: GPU computing gems emerald edition. Elsevier, New York, pp. 737–749
Suzuki K (2017) Overview of deep learning in medical imaging. Radiol Phys Technol 10(3):257–273
Alzubaidi L, Fadhel MA, Al-Shamma O, Zhang J, Santamaría J, Duan Y, Oleiwi SR (2020) Towards a better understanding of transfer learning for medical imaging: a case study. Appl Sci 10(13):4523
Boumaraf S, Liu X, Zheng Z, Ma X, Ferkous C (2021) A new transfer learning based approach to magnification dependent and independent classification of breast cancer in histopathological images. Biomed Signal Process Control 63:102192
Tammina S (2019) Transfer learning using VGG-16 with deep convolutional neural network for classifying images. Int J Sci Res Publ (IJSRP) 9(10):143–150
Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: 2014 science and information conference. IEEE, pp 372–378
Ross AA, Govindarajan R (2005) Feature level fusion of hand and face biometrics. In: Biometric technology for human identification II, vol 5779. SPIE, pp 196–204
Taud H, Mas J (2018) Multilayer perceptron (MLP). Geomatic approaches for modeling land change scenarios, pp 451–455
Tohka J, Van Gils M (2021) Evaluation of machine learning algorithms for health and wellness applications: a tutorial. Comput Biol Med 132:104324
Takahashi K, Yamamoto K, Kuchiba A, Koyama T (2022) Confidence interval for micro-averaged f 1 and macro-averaged f 1 scores. Appl Intell 52(5):4961–4972
Mohr F, van Rijn JN (2022) Learning curves for decision making in supervised machine learning—a survey. arXiv preprint arXiv:2201.12150
Viering T, Loog M (2022) The shape of learning curves: a review. IEEE Trans Pattern Anal Mach Intell
Brownlee J (2018) What is the difference between a batch and an epoch in a neural network. Mach Learn Mastery 20
Sun D, Wang M, Li A (2018) A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data. IEEE/ACM Trans Comput Biol Bioinf 16(3):841–850
Liu H, Dai Z, So D, Le QV (2021) Pay attention to MLPS. Adv Neural Inf Process Syst 34:9204–9215
Funding
There is no funding involved in writing this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There is no potential conflict of interest in our paper. And all authors have seen the manuscript and approved it to submit to your journal.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rani, S., Ahmad, T., Masood, S. et al. Diagnosis of breast cancer molecular subtypes using machine learning models on unimodal and multimodal datasets. Neural Comput & Applic 35, 24109–24121 (2023). https://doi.org/10.1007/s00521-023-09005-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09005-x