Skip to main content
Log in

Lung and colon cancer classification using medical imaging: a feature engineering approach

  • Scientific Paper
  • Published:
Physical and Engineering Sciences in Medicine Aims and scope Submit manuscript

Abstract

Lung and colon cancers lead to a significant portion of deaths. Their simultaneous occurrence is uncommon, however, in the absence of early diagnosis, the metastasis of cancer cells is very high between these two organs. Currently, histopathological diagnosis and appropriate treatment are the only way to improve the chances of survival and reduce cancer mortality. Using artificial intelligence in the histopathological diagnosis of colon and lung cancer can provide significant help to specialists in identifying cases of colon and lung cancers with less effort, time and cost. The objective of this study is to set up a computer-aided diagnostic system that can accurately classify five types of colon and lung tissues (two classes for colon cancer and three classes for lung cancer) by analyzing their histopathological images. Using machine learning, features engineering and image processing techniques, the six models XGBoost, SVM, RF, LDA, MLP and LightGBM were used to perform the classification of histopathological images of lung and colon cancers that were acquired from the LC25000 dataset. The main advantage of using machine learning models is that they allow a better interpretability of the classification model since they are based on feature engineering; however, deep learning models are black box networks whose working is very difficult to understand due to the complex network design. The acquired experimental results show that machine learning models give satisfactory results and are very precise in identifying classes of lung and colon cancer subtypes. The XGBoost model gave the best performance with an accuracy of 99% and a F1-score of 98.8%. The implementation and the development of this model will help healthcare specialists identify types of colon and lung cancers. The code will be available upon request.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A (2018) Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Cancer J Clin 68(6):394–424

  2. Bermúdez A, Arranz-Salas I, Mercado S, López-Villodres JA, González V, Ríus F, Ortega MV, Alba C, Hierro I, Bermúdez D (2021) Her2-positive and microsatellite instability status in gastric cancer-clinicopathological implications. Diagnostics 11:944

    Article  Google Scholar 

  3. Togaçar M (2021) Disease type detection in lung and colon cancer images using the complement approach of inefficient sets. Comput Biol Med 137:104827. https://doi.org/10.1016/j.compbiomed.2021.104827

  4. Sánchez-Peralta LF, Bote-Curiel L, Picón A, Sánchez-Margallo FM, Pagador JB (2020) Deep learning to find colorectal polyps in colonoscopy: a systematic literature review. Artif Intell Med 108:101923. https://doi.org/10.1016/j.artmed.2020.101923

  5. Travis WD et al (2011) International association for the study of lung cancer/American thoracic society/European respiratory society international multidisciplinary classification of lung adenocarcinoma. J Thorac Oncol 6:244–85. https://doi.org/10.1097/JTO.0b013e318206a221

    Article  Google Scholar 

  6. Yu KH, Zhang C, Berry GJ, Altman RB, Ré C, Rubin DL, Snyder M (2016) Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun 7:12474. https://doi.org/10.1038/ncomms12474

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Bazazeh D, Shubair R (2016) Comparative study of machine learning algorithms for breast cancer detection and diagnosis. In: 2016 5th international conference on electronic devices, systems and applications (ICEDSA), pp 1–4. https://doi.org/10.1109/ICEDSA.2016.7818560

  8. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Article  Google Scholar 

  9. Bukhari SUK, Asmara S, Bokhari SKA, Hussain SS, Armaghan SU, Shah SSH (2020) The histological diagnosis of colonic adenocarcinoma by applying partial self supervised learning. https://doi.org/10.1101/2020.08.15.20175760

  10. Hatuwal BK, Thapa HC (2020) Lung cancer detection using convolutional neural network on histopathological images. Int J Comput Trends Technol 68(10):21–24. https://doi.org/10.14445/22312803/IJCTT-V68I10P104

  11. Nishio M, Nishio M, Jimbo N, Nakane K (2021) Homology-based image processing for automatic classification of histopathological images of lung tissue. Cancers 13:1192. https://doi.org/10.3390/cancers13061192

    Article  PubMed  PubMed Central  Google Scholar 

  12. Masud M, Sikder N, Nahid AA, Bairagi AK, AlZain MA (2021) A machine learning approach to diagnosing lung and colon cancer using a deep learning-based classification framework. Sensors 21:748. https://doi.org/10.3390/s21030748

    Article  PubMed  PubMed Central  Google Scholar 

  13. Mangal S, Chaurasia A, Khajanchi A (2020) Convolution neural networks for diagnosing colon and lung cancer histopathological images. arXiv:2009.03878

  14. Dargan S, Kumar M, Ayyagari MR et al (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Arch Comput Methods Eng 27:1071–1092. https://doi.org/10.1007/s11831-019-09344-w

    Article  Google Scholar 

  15. Borkowski AA, Bui MM, Thomas LB, Wilson CP, DeLand LA, Mastorides SM (2021) Lung and colon cancer histopathological images dataset| Kaggle. https://www.kaggle.com/andrewmvd/lung-and-colon- cancer-histopathological-images

  16. Borkowski AA, Bui MM, Thomas LB, Wilson CP, DeLand LA, Mastorides SM (2019) Lung and colon cancer histopathological image dataset (LC25000) arXiv:1912.12142v1 [eess.IV].

  17. Janowczyk A, Basavanhally A, Madabhushi A (2017) Stain normalization using sparse autoEncoders (StaNoSA): application to digital pathology. Comput Med Imaging Graph 57:50–61. https://doi.org/10.1016/j.compmedimag.2016.05.003

    Article  PubMed  Google Scholar 

  18. Macenko M, Niethammer M, Marron JS, Borland D, Woosley JT, Guan X, Schmitt C, Thomas NE (2009) A method for normalizing histology slides for quantitative analysis. In: IEEE international symposium on biomedical imaging. Boston, MA 1107–1110

  19. Vahadane A, Peng T, Sethi A, Albarqouni S, Wang L, Baust M, Steiger K, Schlitter, Anna M, Esposito I, Navab N (2016) Structure-preserving color normalization and sparse stain separation for histological images. In: IEEE transactions on medical imaging, vol 35, no 8, pp 1962–1971. https://doi.org/10.1109/TMI.2016.2529665

  20. Ciompi F, Geessink O, Bejnordi BE, Bejnordi B, de Souza GS, Baidoshvili A, Litjens G, Van Ginneken B, Nagtegaal I, Van Der Laak J (2017) The importance of stain normalization in colorectal tissue classification with convolutional networks. CoRR. arXiv:1702.05931

  21. Lafarge MW, Pluim JPW, Eppenhof K, Moeskops P, Veta M (2017) Domain-adversarial neural networks to address the appearance variability of histopathology images. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, DLMIA, Québec City, QC pp 83–91

  22. Alinsaif S, Lang J (2020) Texture features in the shearlet domain for histopathological image classification. BMC Med Informat Decis Making 20(S14):1–19

    Google Scholar 

  23. Madero Orozco H, Vergara Villegas OO, Cruz Sánchez VG, Ochoa Domínguez H, Nandayapa Alfaro M (2015) An automated systems for lungs nodule classifications based on wavelet feature descriptors and support-vector-machines. Biomed Eng Online 14(1):9

    Article  Google Scholar 

  24. Aggarwal N, Agrawal RK (2012) First and second order statistics features for classification of magnetic resonance brain images. J Signal Inf Process 3(2):146–153. https://doi.org/10.4236/jsip.2012.32019

    Article  Google Scholar 

  25. Li M, Ma X, Chen C, Yuan Y, Zhang S, Yan Z, Chen C, Chen F, Bai Y, Zhou P, et al (2021) Research on the auxiliary classification and diagnosis of lung cancer subtypes based on histopathological images. IEEE Access 9:53687–53707

  26. Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN COMPUT. SCI. 2:160. https://doi.org/10.1007/s42979-021-00592-x

    Article  PubMed  PubMed Central  Google Scholar 

  27. Molnar C (2019) Interpretable machine learning. A guide for making black box models explainable. https://christophm.github.io/interpretable-ml-book/

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aya Hage Chehade.

Ethics declarations

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This study uses public databases cited in the references and therefore this section is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hage Chehade, A., Abdallah, N., Marion, JM. et al. Lung and colon cancer classification using medical imaging: a feature engineering approach. Phys Eng Sci Med 45, 729–746 (2022). https://doi.org/10.1007/s13246-022-01139-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13246-022-01139-x

Keywords

Navigation