Abstract
Lung and colon cancers lead to a significant portion of deaths. Their simultaneous occurrence is uncommon, however, in the absence of early diagnosis, the metastasis of cancer cells is very high between these two organs. Currently, histopathological diagnosis and appropriate treatment are the only way to improve the chances of survival and reduce cancer mortality. Using artificial intelligence in the histopathological diagnosis of colon and lung cancer can provide significant help to specialists in identifying cases of colon and lung cancers with less effort, time and cost. The objective of this study is to set up a computer-aided diagnostic system that can accurately classify five types of colon and lung tissues (two classes for colon cancer and three classes for lung cancer) by analyzing their histopathological images. Using machine learning, features engineering and image processing techniques, the six models XGBoost, SVM, RF, LDA, MLP and LightGBM were used to perform the classification of histopathological images of lung and colon cancers that were acquired from the LC25000 dataset. The main advantage of using machine learning models is that they allow a better interpretability of the classification model since they are based on feature engineering; however, deep learning models are black box networks whose working is very difficult to understand due to the complex network design. The acquired experimental results show that machine learning models give satisfactory results and are very precise in identifying classes of lung and colon cancer subtypes. The XGBoost model gave the best performance with an accuracy of 99% and a F1-score of 98.8%. The implementation and the development of this model will help healthcare specialists identify types of colon and lung cancers. The code will be available upon request.
Similar content being viewed by others
References
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A (2018) Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Cancer J Clin 68(6):394–424
Bermúdez A, Arranz-Salas I, Mercado S, López-Villodres JA, González V, Ríus F, Ortega MV, Alba C, Hierro I, Bermúdez D (2021) Her2-positive and microsatellite instability status in gastric cancer-clinicopathological implications. Diagnostics 11:944
Togaçar M (2021) Disease type detection in lung and colon cancer images using the complement approach of inefficient sets. Comput Biol Med 137:104827. https://doi.org/10.1016/j.compbiomed.2021.104827
Sánchez-Peralta LF, Bote-Curiel L, Picón A, Sánchez-Margallo FM, Pagador JB (2020) Deep learning to find colorectal polyps in colonoscopy: a systematic literature review. Artif Intell Med 108:101923. https://doi.org/10.1016/j.artmed.2020.101923
Travis WD et al (2011) International association for the study of lung cancer/American thoracic society/European respiratory society international multidisciplinary classification of lung adenocarcinoma. J Thorac Oncol 6:244–85. https://doi.org/10.1097/JTO.0b013e318206a221
Yu KH, Zhang C, Berry GJ, Altman RB, Ré C, Rubin DL, Snyder M (2016) Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun 7:12474. https://doi.org/10.1038/ncomms12474
Bazazeh D, Shubair R (2016) Comparative study of machine learning algorithms for breast cancer detection and diagnosis. In: 2016 5th international conference on electronic devices, systems and applications (ICEDSA), pp 1–4. https://doi.org/10.1109/ICEDSA.2016.7818560
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Bukhari SUK, Asmara S, Bokhari SKA, Hussain SS, Armaghan SU, Shah SSH (2020) The histological diagnosis of colonic adenocarcinoma by applying partial self supervised learning. https://doi.org/10.1101/2020.08.15.20175760
Hatuwal BK, Thapa HC (2020) Lung cancer detection using convolutional neural network on histopathological images. Int J Comput Trends Technol 68(10):21–24. https://doi.org/10.14445/22312803/IJCTT-V68I10P104
Nishio M, Nishio M, Jimbo N, Nakane K (2021) Homology-based image processing for automatic classification of histopathological images of lung tissue. Cancers 13:1192. https://doi.org/10.3390/cancers13061192
Masud M, Sikder N, Nahid AA, Bairagi AK, AlZain MA (2021) A machine learning approach to diagnosing lung and colon cancer using a deep learning-based classification framework. Sensors 21:748. https://doi.org/10.3390/s21030748
Mangal S, Chaurasia A, Khajanchi A (2020) Convolution neural networks for diagnosing colon and lung cancer histopathological images. arXiv:2009.03878
Dargan S, Kumar M, Ayyagari MR et al (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Arch Comput Methods Eng 27:1071–1092. https://doi.org/10.1007/s11831-019-09344-w
Borkowski AA, Bui MM, Thomas LB, Wilson CP, DeLand LA, Mastorides SM (2021) Lung and colon cancer histopathological images dataset| Kaggle. https://www.kaggle.com/andrewmvd/lung-and-colon- cancer-histopathological-images
Borkowski AA, Bui MM, Thomas LB, Wilson CP, DeLand LA, Mastorides SM (2019) Lung and colon cancer histopathological image dataset (LC25000) arXiv:1912.12142v1 [eess.IV].
Janowczyk A, Basavanhally A, Madabhushi A (2017) Stain normalization using sparse autoEncoders (StaNoSA): application to digital pathology. Comput Med Imaging Graph 57:50–61. https://doi.org/10.1016/j.compmedimag.2016.05.003
Macenko M, Niethammer M, Marron JS, Borland D, Woosley JT, Guan X, Schmitt C, Thomas NE (2009) A method for normalizing histology slides for quantitative analysis. In: IEEE international symposium on biomedical imaging. Boston, MA 1107–1110
Vahadane A, Peng T, Sethi A, Albarqouni S, Wang L, Baust M, Steiger K, Schlitter, Anna M, Esposito I, Navab N (2016) Structure-preserving color normalization and sparse stain separation for histological images. In: IEEE transactions on medical imaging, vol 35, no 8, pp 1962–1971. https://doi.org/10.1109/TMI.2016.2529665
Ciompi F, Geessink O, Bejnordi BE, Bejnordi B, de Souza GS, Baidoshvili A, Litjens G, Van Ginneken B, Nagtegaal I, Van Der Laak J (2017) The importance of stain normalization in colorectal tissue classification with convolutional networks. CoRR. arXiv:1702.05931
Lafarge MW, Pluim JPW, Eppenhof K, Moeskops P, Veta M (2017) Domain-adversarial neural networks to address the appearance variability of histopathology images. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, DLMIA, Québec City, QC pp 83–91
Alinsaif S, Lang J (2020) Texture features in the shearlet domain for histopathological image classification. BMC Med Informat Decis Making 20(S14):1–19
Madero Orozco H, Vergara Villegas OO, Cruz Sánchez VG, Ochoa Domínguez H, Nandayapa Alfaro M (2015) An automated systems for lungs nodule classifications based on wavelet feature descriptors and support-vector-machines. Biomed Eng Online 14(1):9
Aggarwal N, Agrawal RK (2012) First and second order statistics features for classification of magnetic resonance brain images. J Signal Inf Process 3(2):146–153. https://doi.org/10.4236/jsip.2012.32019
Li M, Ma X, Chen C, Yuan Y, Zhang S, Yan Z, Chen C, Chen F, Bai Y, Zhou P, et al (2021) Research on the auxiliary classification and diagnosis of lung cancer subtypes based on histopathological images. IEEE Access 9:53687–53707
Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN COMPUT. SCI. 2:160. https://doi.org/10.1007/s42979-021-00592-x
Molnar C (2019) Interpretable machine learning. A guide for making black box models explainable. https://christophm.github.io/interpretable-ml-book/
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors have no relevant financial or non-financial interests to disclose.
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This study uses public databases cited in the references and therefore this section is not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hage Chehade, A., Abdallah, N., Marion, JM. et al. Lung and colon cancer classification using medical imaging: a feature engineering approach. Phys Eng Sci Med 45, 729–746 (2022). https://doi.org/10.1007/s13246-022-01139-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13246-022-01139-x