Diagnosis of breast cancer molecular subtypes using machine learning models on unimodal and multimodal datasets

Rani, Samta; Ahmad, Tanvir; Masood, Sarfaraz; Saxena, Chandni

doi:10.1007/s00521-023-09005-x

Diagnosis of breast cancer molecular subtypes using machine learning models on unimodal and multimodal datasets

Original Article
Published: 19 September 2023

Volume 35, pages 24109–24121, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Samta Rani¹,
Tanvir Ahmad¹,
Sarfaraz Masood¹ &
…
Chandni Saxena²

452 Accesses
Explore all metrics

Abstract

Breast cancer is a significant global health concern, with millions of cases and deaths each year. Accurate diagnosis is critical for timely treatment and medication. Machine learning techniques have shown promising results in detecting breast cancer. Previous studies have primarily used single-modality data for breast cancer diagnosis. Hence, this work aims to mobilize the benefits of multimodal data over unimodality samples. This study proposes a custom deep learning-based model pipeline that works over this multimodal data. This work has been separated into three phases. Phase 1 and Phase 2 under the unimodal category examine gene expression data and histopathological images separately. The Cancer Genome Atlas makes these datasets available. In Phase 3, the proposed pipeline operates on both data types’ samples for each patient in the multimodal category. This study investigates how data pre-processing (cleaning, transformation, reduction) and cascaded filtering affect model performance. Precision, recall, f1-score, and accuracy assessed the models, whereas L2 regularization, exponentially weighted moving average, and transfer learning minimized over-fitting. A custom deep neural network and support vector machine obtained 86% accuracy in Phase 1, whereas the VGG16 model reached 80.21% accuracy in Phase 2. In Phase 3, the curated multimodal dataset was applied to a custom deep learning pipeline (VGG16 backbone with hyper-tuned machine learning models as head classifiers) to achieve 94% accuracy, demonstrating the importance of multimodal data over unimodal in breast cancer subtype classification. These findings highlight the importance of multimodal data for breast cancer diagnosis and subtype prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discrimination between HER2-overexpressing, -low-expressing, and -zero-expressing statuses in breast cancer using multiparametric MRI-based radiomics

Article 16 February 2024

Machine learning for risk stratification of thyroid cancer patients: a 15-year cohort study

Article 30 October 2023

A systematic review of deep learning-based cervical cytology screening: from cell identification to whole slide image analysis

Article Open access 05 October 2023

Data availability

The dataset analyzed during this study can be found in (https://portal.gdc.cancer.gov/projects/TCGA-BRCA).

Notes

References

DeSantis CE, Bray F, Ferlay J, Lortet-Tieulent J, Anderson BO, Jemal A (2015) International variation in female breast cancer incidence and mortality rates international variation in female breast cancer rates. Cancer Epidemiol Biomark Prevent 24(10):1495–1506
Article Google Scholar
Goldin A, Venditti JM, Macdonald JS, Muggia FM, Henney JE, Devita Jr VT (1981) Current results of the screening program at the division of cancer treatment, national cancer institute. Eur J Cancer (1965) 17(2), 129–142
Momenimovahed Z, Salehiniya H (2019) Epidemiological characteristics of and risk factors for breast cancer in the world. Breast Cancer Targets Ther 151–164
Siegel RL, Miller KD, Wagle NS, Jemal A (2023) Cancer statistics. CA Cancer J Clin 73(1):17–48
Article Google Scholar
Arnold M, Morgan E, Rumgay H, Mafra A, Singh D, Laversanne M, Vignat J, Gralow JR, Cardoso F, Siesling S et al (2022) Current and future burden of breast cancer: global statistics for 2020 and 2040. Breast 66:15–23
Article Google Scholar
Kuhl CK (2023) What the future holds for the screening, diagnosis, and treatment of breast cancer. Radiological Society of North America
Unger-Saldaña K (2014) Challenges to the early diagnosis and treatment of breast cancer in developing countries. World J Clin Oncol 5(3):465
Article Google Scholar
Dileep G, Gyani SGG (2022) Artificial intelligence in breast cancer screening and diagnosis. Cureus 14(10):e30318
Google Scholar
Nassif AB, Talib MA, Nasir Q, Afadar Y, Elgendy O (2022) Breast cancer detection using artificial intelligence techniques: a systematic literature review. Artif Intell Med 102276
Yersal O, Barutca S (2014) Biological subtypes of breast cancer: prognostic and therapeutic implications. World J Clin Oncol 5(3):412
Article Google Scholar
Lipkova J, Chen RJ, Chen B, Lu MY, Barbieri M, Shao D, Vaidya AJ, Chen C, Zhuang L, Williamson DF et al (2022) Artificial intelligence for multimodal data integration in oncology. Cancer Cell 40(10):1095–1110
Article Google Scholar
Kurt F, Agaoglu M, Arga KY (2022) Precision oncology: an ensembled machine learning approach to identify a candidate MRNA panel for stratification of patients with breast cancer. OMICS 26(9):504–511
Article Google Scholar
Kim AA, Zaim SR, Subbian V (2020) Assessing reproducibility and veracity across machine learning techniques in biomedicine: a case study using TCGA data. Int J Med Inform 141:104148
Article Google Scholar
Phan NN, Huang C-C, Tseng L-M, Chuang EY (2021) Predicting breast cancer gene expression signature by applying deep convolutional neural networks from unannotated pathological images. Front Oncol 11:769447
Article Google Scholar
Xie J, Liu R, Luttrell J IV, Zhang C (2019) Deep learning based analysis of histopathological images of breast cancer. Front Genet 10:80
Article Google Scholar
Liu T, Huang J, Liao T, Pu R, Liu S, Peng Y (2022) A hybrid deep learning model for predicting molecular subtypes of human breast cancer using multimodal data. Irbm 43(1):62–74
Article Google Scholar
Ash JT, Darnell G, Munro D, Engelhardt BE (2021) Joint analysis of expression levels and histological images identifies genes associated with tissue morphology. Nat Commun 12(1):1609
Article Google Scholar
Sun D, Li A, Tang B, Wang M (2018) Integrating genomic data and pathological images to effectively predict breast cancer clinical outcome. Comput Methods Programs Biomed 161:45–53
Article Google Scholar
MacFadyen C, Duraiswamy A, Harris-Birtill D (2023) Classification of hyper-scale multimodal imaging datasets. medRxiv 2023-01
Li B, Nabavi S (2023) A multimodal graph neural network framework for cancer molecular subtype classification. arXiv preprint arXiv:2302.12838
Popovici V, Budinska E, Čápková L, Schwarz D, Dušek L, Feit J, Jaggi R (2016) Joint analysis of histopathology image features and gene expression in breast cancer. BMC Bioinform 17(1):1–9
Article Google Scholar
M’Sabah CEL, Bouziane A, Ferdi Y (2021) A survey on deep learning methods for cancer diagnosis using multimodal data fusion. In: 2021 international conference on e-health and bioengineering (EHB), pp 1–4. IEEE
Hou Y (2020) Breast cancer pathological image classification based on deep learning. J Xray Sci Technol 28(4):727–738
Google Scholar
Zhu Z, Albadawy E, Saha A, Zhang J, Harowicz MR, Mazurowski MA (2018) Breast cancer molecular subtype classification using deep features: preliminary results. In: Medical imaging 2018: computer-aided diagnosis, vol 10575. SPIE, pp 651–656
Kotsiantis SB, Zaharakis I, Pintelas P et al (2007) Supervised machine learning: a review of classification techniques. Emerging Artif Intell Appl Comput Eng 160(1):3–24
Google Scholar
Dwivedi AK (2018) Artificial neural network model for effective cancer classification using microarray gene expression data. Neural Comput Appl 29:1545–1554
Article Google Scholar
Redzuwan R, Radzi NAM, Din NM, Mustafa I (2015) Affine versus projective transformation for sift and ransac image matching methods. In: 2015 IEEE international conference on signal and image processing applications (ICSIPA). IEEE, pp 447–451
Rana R, Verma A (2014) Comparison and enhancement of digital image by using canny filter and sobel filter. IOSR J Comput Eng 16(1):06–10
Article Google Scholar
Jeong W-K, Pfister H, Fatica M (2011) Medical image processing using GPU-accelerated ITK image filters. In: GPU computing gems emerald edition. Elsevier, New York, pp. 737–749
Suzuki K (2017) Overview of deep learning in medical imaging. Radiol Phys Technol 10(3):257–273
Article Google Scholar
Alzubaidi L, Fadhel MA, Al-Shamma O, Zhang J, Santamaría J, Duan Y, Oleiwi SR (2020) Towards a better understanding of transfer learning for medical imaging: a case study. Appl Sci 10(13):4523
Article Google Scholar
Boumaraf S, Liu X, Zheng Z, Ma X, Ferkous C (2021) A new transfer learning based approach to magnification dependent and independent classification of breast cancer in histopathological images. Biomed Signal Process Control 63:102192
Article Google Scholar
Tammina S (2019) Transfer learning using VGG-16 with deep convolutional neural network for classifying images. Int J Sci Res Publ (IJSRP) 9(10):143–150
Google Scholar
Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: 2014 science and information conference. IEEE, pp 372–378
Ross AA, Govindarajan R (2005) Feature level fusion of hand and face biometrics. In: Biometric technology for human identification II, vol 5779. SPIE, pp 196–204
Taud H, Mas J (2018) Multilayer perceptron (MLP). Geomatic approaches for modeling land change scenarios, pp 451–455
Tohka J, Van Gils M (2021) Evaluation of machine learning algorithms for health and wellness applications: a tutorial. Comput Biol Med 132:104324
Article Google Scholar
Takahashi K, Yamamoto K, Kuchiba A, Koyama T (2022) Confidence interval for micro-averaged f 1 and macro-averaged f 1 scores. Appl Intell 52(5):4961–4972
Article Google Scholar
Mohr F, van Rijn JN (2022) Learning curves for decision making in supervised machine learning—a survey. arXiv preprint arXiv:2201.12150
Viering T, Loog M (2022) The shape of learning curves: a review. IEEE Trans Pattern Anal Mach Intell
Brownlee J (2018) What is the difference between a batch and an epoch in a neural network. Mach Learn Mastery 20
Sun D, Wang M, Li A (2018) A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data. IEEE/ACM Trans Comput Biol Bioinf 16(3):841–850
Article Google Scholar
Liu H, Dai Z, So D, Le QV (2021) Pay attention to MLPS. Adv Neural Inf Process Syst 34:9204–9215
Google Scholar

Download references

Funding

There is no funding involved in writing this paper.

Author information

Authors and Affiliations

Department of Computer Engineering, Jamia Millia Islamia University, New Delhi, India
Samta Rani, Tanvir Ahmad & Sarfaraz Masood
The Chinese University of Hong Kong, Sha Tin, Hong Kong, SAR, China
Chandni Saxena

Authors

Samta Rani
View author publications
You can also search for this author in PubMed Google Scholar
Tanvir Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Sarfaraz Masood
View author publications
You can also search for this author in PubMed Google Scholar
Chandni Saxena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samta Rani.

Ethics declarations

Conflict of interest

There is no potential conflict of interest in our paper. And all authors have seen the manuscript and approved it to submit to your journal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rani, S., Ahmad, T., Masood, S. et al. Diagnosis of breast cancer molecular subtypes using machine learning models on unimodal and multimodal datasets. Neural Comput & Applic 35, 24109–24121 (2023). https://doi.org/10.1007/s00521-023-09005-x

Download citation

Received: 27 April 2023
Accepted: 22 August 2023
Published: 19 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00521-023-09005-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Diagnosis of breast cancer molecular subtypes using machine learning models on unimodal and multimodal datasets

Abstract

Access this article

Similar content being viewed by others

Discrimination between HER2-overexpressing, -low-expressing, and -zero-expressing statuses in breast cancer using multiparametric MRI-based radiomics

Machine learning for risk stratification of thyroid cancer patients: a 15-year cohort study

A systematic review of deep learning-based cervical cytology screening: from cell identification to whole slide image analysis

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Diagnosis of breast cancer molecular subtypes using machine learning models on unimodal and multimodal datasets

Abstract

Access this article

Similar content being viewed by others

Discrimination between HER2-overexpressing, -low-expressing, and -zero-expressing statuses in breast cancer using multiparametric MRI-based radiomics

Machine learning for risk stratification of thyroid cancer patients: a 15-year cohort study

A systematic review of deep learning-based cervical cytology screening: from cell identification to whole slide image analysis

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation