Skip to main content
Log in

Diagnosis of breast cancer molecular subtypes using machine learning models on unimodal and multimodal datasets

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Breast cancer is a significant global health concern, with millions of cases and deaths each year. Accurate diagnosis is critical for timely treatment and medication. Machine learning techniques have shown promising results in detecting breast cancer. Previous studies have primarily used single-modality data for breast cancer diagnosis. Hence, this work aims to mobilize the benefits of multimodal data over unimodality samples. This study proposes a custom deep learning-based model pipeline that works over this multimodal data. This work has been separated into three phases. Phase 1 and Phase 2 under the unimodal category examine gene expression data and histopathological images separately. The Cancer Genome Atlas makes these datasets available. In Phase 3, the proposed pipeline operates on both data types’ samples for each patient in the multimodal category. This study investigates how data pre-processing (cleaning, transformation, reduction) and cascaded filtering affect model performance. Precision, recall, f1-score, and accuracy assessed the models, whereas L2 regularization, exponentially weighted moving average, and transfer learning minimized over-fitting. A custom deep neural network and support vector machine obtained 86% accuracy in Phase 1, whereas the VGG16 model reached 80.21% accuracy in Phase 2. In Phase 3, the curated multimodal dataset was applied to a custom deep learning pipeline (VGG16 backbone with hyper-tuned machine learning models as head classifiers) to achieve 94% accuracy, demonstrating the importance of multimodal data over unimodal in breast cancer subtype classification. These findings highlight the importance of multimodal data for breast cancer diagnosis and subtype prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The dataset analyzed during this study can be found in (https://portal.gdc.cancer.gov/projects/TCGA-BRCA).

Notes

  1. https://portal.gdc.cancer.gov/projects/TCGA-BRCA.

  2. https://portal.gdc.cancer.gov/projects/TCGA-BRCA.

References

  1. DeSantis CE, Bray F, Ferlay J, Lortet-Tieulent J, Anderson BO, Jemal A (2015) International variation in female breast cancer incidence and mortality rates international variation in female breast cancer rates. Cancer Epidemiol Biomark Prevent 24(10):1495–1506

    Article  Google Scholar 

  2. Goldin A, Venditti JM, Macdonald JS, Muggia FM, Henney JE, Devita Jr VT (1981) Current results of the screening program at the division of cancer treatment, national cancer institute. Eur J Cancer (1965) 17(2), 129–142

  3. Momenimovahed Z, Salehiniya H (2019) Epidemiological characteristics of and risk factors for breast cancer in the world. Breast Cancer Targets Ther 151–164

  4. Siegel RL, Miller KD, Wagle NS, Jemal A (2023) Cancer statistics. CA Cancer J Clin 73(1):17–48

    Article  Google Scholar 

  5. Arnold M, Morgan E, Rumgay H, Mafra A, Singh D, Laversanne M, Vignat J, Gralow JR, Cardoso F, Siesling S et al (2022) Current and future burden of breast cancer: global statistics for 2020 and 2040. Breast 66:15–23

    Article  Google Scholar 

  6. Kuhl CK (2023) What the future holds for the screening, diagnosis, and treatment of breast cancer. Radiological Society of North America

  7. Unger-Saldaña K (2014) Challenges to the early diagnosis and treatment of breast cancer in developing countries. World J Clin Oncol 5(3):465

    Article  Google Scholar 

  8. Dileep G, Gyani SGG (2022) Artificial intelligence in breast cancer screening and diagnosis. Cureus 14(10):e30318

    Google Scholar 

  9. Nassif AB, Talib MA, Nasir Q, Afadar Y, Elgendy O (2022) Breast cancer detection using artificial intelligence techniques: a systematic literature review. Artif Intell Med 102276

  10. Yersal O, Barutca S (2014) Biological subtypes of breast cancer: prognostic and therapeutic implications. World J Clin Oncol 5(3):412

    Article  Google Scholar 

  11. Lipkova J, Chen RJ, Chen B, Lu MY, Barbieri M, Shao D, Vaidya AJ, Chen C, Zhuang L, Williamson DF et al (2022) Artificial intelligence for multimodal data integration in oncology. Cancer Cell 40(10):1095–1110

    Article  Google Scholar 

  12. Kurt F, Agaoglu M, Arga KY (2022) Precision oncology: an ensembled machine learning approach to identify a candidate MRNA panel for stratification of patients with breast cancer. OMICS 26(9):504–511

    Article  Google Scholar 

  13. Kim AA, Zaim SR, Subbian V (2020) Assessing reproducibility and veracity across machine learning techniques in biomedicine: a case study using TCGA data. Int J Med Inform 141:104148

    Article  Google Scholar 

  14. Phan NN, Huang C-C, Tseng L-M, Chuang EY (2021) Predicting breast cancer gene expression signature by applying deep convolutional neural networks from unannotated pathological images. Front Oncol 11:769447

    Article  Google Scholar 

  15. Xie J, Liu R, Luttrell J IV, Zhang C (2019) Deep learning based analysis of histopathological images of breast cancer. Front Genet 10:80

    Article  Google Scholar 

  16. Liu T, Huang J, Liao T, Pu R, Liu S, Peng Y (2022) A hybrid deep learning model for predicting molecular subtypes of human breast cancer using multimodal data. Irbm 43(1):62–74

    Article  Google Scholar 

  17. Ash JT, Darnell G, Munro D, Engelhardt BE (2021) Joint analysis of expression levels and histological images identifies genes associated with tissue morphology. Nat Commun 12(1):1609

    Article  Google Scholar 

  18. Sun D, Li A, Tang B, Wang M (2018) Integrating genomic data and pathological images to effectively predict breast cancer clinical outcome. Comput Methods Programs Biomed 161:45–53

    Article  Google Scholar 

  19. MacFadyen C, Duraiswamy A, Harris-Birtill D (2023) Classification of hyper-scale multimodal imaging datasets. medRxiv 2023-01

  20. Li B, Nabavi S (2023) A multimodal graph neural network framework for cancer molecular subtype classification. arXiv preprint arXiv:2302.12838

  21. Popovici V, Budinska E, Čápková L, Schwarz D, Dušek L, Feit J, Jaggi R (2016) Joint analysis of histopathology image features and gene expression in breast cancer. BMC Bioinform 17(1):1–9

    Article  Google Scholar 

  22. M’Sabah CEL, Bouziane A, Ferdi Y (2021) A survey on deep learning methods for cancer diagnosis using multimodal data fusion. In: 2021 international conference on e-health and bioengineering (EHB), pp 1–4. IEEE

  23. Hou Y (2020) Breast cancer pathological image classification based on deep learning. J Xray Sci Technol 28(4):727–738

    Google Scholar 

  24. Zhu Z, Albadawy E, Saha A, Zhang J, Harowicz MR, Mazurowski MA (2018) Breast cancer molecular subtype classification using deep features: preliminary results. In: Medical imaging 2018: computer-aided diagnosis, vol 10575. SPIE, pp 651–656

  25. Kotsiantis SB, Zaharakis I, Pintelas P et al (2007) Supervised machine learning: a review of classification techniques. Emerging Artif Intell Appl Comput Eng 160(1):3–24

    Google Scholar 

  26. Dwivedi AK (2018) Artificial neural network model for effective cancer classification using microarray gene expression data. Neural Comput Appl 29:1545–1554

    Article  Google Scholar 

  27. Redzuwan R, Radzi NAM, Din NM, Mustafa I (2015) Affine versus projective transformation for sift and ransac image matching methods. In: 2015 IEEE international conference on signal and image processing applications (ICSIPA). IEEE, pp 447–451

  28. Rana R, Verma A (2014) Comparison and enhancement of digital image by using canny filter and sobel filter. IOSR J Comput Eng 16(1):06–10

    Article  Google Scholar 

  29. Jeong W-K, Pfister H, Fatica M (2011) Medical image processing using GPU-accelerated ITK image filters. In: GPU computing gems emerald edition. Elsevier, New York, pp. 737–749

  30. Suzuki K (2017) Overview of deep learning in medical imaging. Radiol Phys Technol 10(3):257–273

    Article  Google Scholar 

  31. Alzubaidi L, Fadhel MA, Al-Shamma O, Zhang J, Santamaría J, Duan Y, Oleiwi SR (2020) Towards a better understanding of transfer learning for medical imaging: a case study. Appl Sci 10(13):4523

    Article  Google Scholar 

  32. Boumaraf S, Liu X, Zheng Z, Ma X, Ferkous C (2021) A new transfer learning based approach to magnification dependent and independent classification of breast cancer in histopathological images. Biomed Signal Process Control 63:102192

    Article  Google Scholar 

  33. Tammina S (2019) Transfer learning using VGG-16 with deep convolutional neural network for classifying images. Int J Sci Res Publ (IJSRP) 9(10):143–150

    Google Scholar 

  34. Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: 2014 science and information conference. IEEE, pp 372–378

  35. Ross AA, Govindarajan R (2005) Feature level fusion of hand and face biometrics. In: Biometric technology for human identification II, vol 5779. SPIE, pp 196–204

  36. Taud H, Mas J (2018) Multilayer perceptron (MLP). Geomatic approaches for modeling land change scenarios, pp 451–455

  37. Tohka J, Van Gils M (2021) Evaluation of machine learning algorithms for health and wellness applications: a tutorial. Comput Biol Med 132:104324

    Article  Google Scholar 

  38. Takahashi K, Yamamoto K, Kuchiba A, Koyama T (2022) Confidence interval for micro-averaged f 1 and macro-averaged f 1 scores. Appl Intell 52(5):4961–4972

    Article  Google Scholar 

  39. Mohr F, van Rijn JN (2022) Learning curves for decision making in supervised machine learning—a survey. arXiv preprint arXiv:2201.12150

  40. Viering T, Loog M (2022) The shape of learning curves: a review. IEEE Trans Pattern Anal Mach Intell

  41. Brownlee J (2018) What is the difference between a batch and an epoch in a neural network. Mach Learn Mastery 20

  42. Sun D, Wang M, Li A (2018) A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data. IEEE/ACM Trans Comput Biol Bioinf 16(3):841–850

    Article  Google Scholar 

  43. Liu H, Dai Z, So D, Le QV (2021) Pay attention to MLPS. Adv Neural Inf Process Syst 34:9204–9215

    Google Scholar 

Download references

Funding

There is no funding involved in writing this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samta Rani.

Ethics declarations

Conflict of interest

There is no potential conflict of interest in our paper. And all authors have seen the manuscript and approved it to submit to your journal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rani, S., Ahmad, T., Masood, S. et al. Diagnosis of breast cancer molecular subtypes using machine learning models on unimodal and multimodal datasets. Neural Comput & Applic 35, 24109–24121 (2023). https://doi.org/10.1007/s00521-023-09005-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09005-x

Keywords

Navigation