Skip to main content

Advertisement

Log in

A feature engineering-based machine learning technique to detect and classify lung and colon cancer from histopathological images

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

Globally, lung and colon cancers are among the most prevalent and lethal tumors. Early cancer identification is essential to increase the likelihood of survival. Histopathological images are considered an appropriate tool for diagnosing cancer, which is tedious and error-prone if done manually. Recently, machine learning methods based on feature engineering have gained prominence in automatic histopathological image classification. Furthermore, these methods are more interpretable than deep learning, which operates in a “black box” manner. In the medical profession, the interpretability of a technique is critical to gaining the trust of end users to adopt it. In view of the above, this work aims to create an accurate and interpretable machine-learning technique for the automated classification of lung and colon cancers from histopathology images. In the proposed approach, following the preprocessing steps, texture and color features are retrieved by utilizing the Haralick and Color histogram feature extraction algorithms, respectively. The obtained features are concatenated to form a single feature set. The three feature sets (texture, color, and combined features) are passed into the Light Gradient Boosting Machine (LightGBM) classifier for classification. And their performance is evaluated on the LC25000 dataset using hold-out and stratified 10-fold cross-validation (Stratified 10-FCV) techniques. With a test/hold-out set, the LightGBM with texture, color, and combined features classifies the lung and colon cancer images with 97.72%, 99.92%, and 100% accuracy respectively. In addition, a stratified 10-fold cross-validation method also revealed that LightGBM’s combined or color features performed well, with an excellent mean auc_mu score and a low mean multi_logloss value. Thus, this proposed technique can help histologists detect and classify lung and colon histopathology images more efficiently, effectively, and economically, resulting in more productivity.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The datasets used during the current study are publicly available, [Available from: https://github.com/tampapath/lung_colon_image_set].

References

  1. Adu K, Yu Y, Cai J et al (2021) DHS-CapsNet: dual horizontal squash capsule networks for lung and colon cancer classification from whole slide histopathological images. Int J Imaging Syst Technol 31:2075–2092. https://doi.org/10.1002/ima.22569

    Article  Google Scholar 

  2. Mehmood S, Ghazal TM, Khan MA et al (2022) Malignancy detection in lung and colon histopathology images using transfer learning with class selective image processing. IEEE Access 10:25657–25668. https://doi.org/10.1109/ACCESS.2022.3150924

    Article  Google Scholar 

  3. Lung Cancer Statistics | How common is lung cancer? https://www.cancer.org/cancer/lung-cancer/about/key-statistics.html. Accessed 11 Jan 2023

  4. Attallah O, Aslan MF, Sabanci K (2022) A framework for lung and colon cancer diagnosis via lightweight deep learning models and transformation methods. Diagnostics 12:2926. https://doi.org/10.3390/diagnostics12122926

    Article  PubMed  PubMed Central  Google Scholar 

  5. Kurishima K, Miyazaki K, Watanabe H et al (2017) Lung cancer patients with synchronous colon cancer. Mol Clin Oncol 8:137–140. https://doi.org/10.3892/mco.2017.1471

    Article  PubMed  PubMed Central  Google Scholar 

  6. Marquette CH, Copin MC, Wallet F et al (1995) Diagnostic tests for pneumonia in ventilated patients: prospective evaluation of diagnostic accuracy using histology as a diagnostic gold standard. Am J Respir Crit Care Med 151:. https://doi.org/10.1164/ajrccm.151.6.7767535

  7. Masud M, Sikder N, Nahid A et al (2021) A machine learning approach to diagnosing lung and colon cancer using a deep learning‐based classification framework. Sensors (Switzerland) 21:. https://doi.org/10.3390/s21030748

  8. Jena B, Saxena S, Nayak GK et al (2021) Artificial intelligence-based hybrid deep learning models for image classification: the first narrative review. Comput Biol Med 137:104803

    Article  PubMed  Google Scholar 

  9. Hosny A, Parmar C, Quackenbush J et al (2018) Artificial intelligence in radiology. Nat Rev Cancer 18:500–510

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Gao Y, Geras KJ, Lewin AA, Moy L (2019) New frontiers: an update on computer-aided diagnosis for breast imaging in the age of artificial intelligence. Am J Roentgenol 212:300

    Article  Google Scholar 

  11. Bianconi F, Fernández A (2007) Evaluation of the effects of Gabor filter parameters on texture classification. Pattern Recognit 40:3325–3335. https://doi.org/10.1016/j.patcog.2007.04.023

    Article  ADS  Google Scholar 

  12. Pang Y, Yan H, Yuan Y, Wang K (2012) Robust CoHOG feature extraction in human-centered image/video management system. IEEE Trans Syst Man Cybern B Cybern 42:458–468. https://doi.org/10.1109/TSMCB.2011.2167750

    Article  PubMed  Google Scholar 

  13. Haralick RM, Dinstein I, Shanmugam K (1973) Textural features for image classification. IEEE Trans Syst Man Cybern SMC-3:610–621. https://doi.org/10.1109/TSMC.1973.4309314

    Article  Google Scholar 

  14. Ojala T, Pietikäinen M, Mäenpää T (2001) A generalized local binary pattern operator for multiresolution gray scale and rotation invariant texture classification. Advances in Pattern Recognition—ICAPR 2001: Second International Conference Rio de Janeiro, Brazil, March 11–14, 2001 Proceedings 2. Springer, Berlin Heidelberg, pp 399–408

    Chapter  Google Scholar 

  15. Choudhury A, Gupta D (2019) A Survey on Medical Diagnosis of Diabetes Using Machine Learning Techniques. In: Recent Developments in Machine Learning and Data Analytics: IC3 2018. Springer, Singapore, pp 67–78

    Chapter  Google Scholar 

  16. Schmidhuber J (2015) Deep Learning in neural networks: an overview. Neural Netw 61:85–117

    Article  PubMed  Google Scholar 

  17. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  18. Anwar SM, Majid M, Qayyum A et al (2018) Medical image analysis using convolutional neural networks: a review. J Med Syst 42:1–13

    Article  Google Scholar 

  19. Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9:611–629

    Article  PubMed  PubMed Central  Google Scholar 

  20. Yang R, Yu Y (2021) Artificial convolutional neural network in object detection and semantic segmentation for medical imaging analysis. Front Oncol 11:638182

    Article  PubMed  PubMed Central  Google Scholar 

  21. Zhai J, Zhang S, Chen J, He Q (2018) Autoencoder and Its Various Variants. In: 2018 IEEE International conference on systems, man, and cybernetics (SMC). IEEE, pp 415–419

    Chapter  Google Scholar 

  22. Goodfellow I, Pouget-Abadie J, Mirza M et al (2020) Generative adversarial networks. Commun ACM 63:139–144. https://doi.org/10.1145/3422622

    Article  Google Scholar 

  23. Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211. https://doi.org/10.1016/0364-0213(90)90002-E

    Article  Google Scholar 

  24. Creswell A, Bharath AA (2019) Denoising adversarial autoencoders. IEEE Trans Neural Netw Learn Syst 30:968–984. https://doi.org/10.1109/TNNLS.2018.2852738

    Article  MathSciNet  PubMed  Google Scholar 

  25. Lee H, Chen YPP (2015) Image based computer aided diagnosis system for cancer detection. Expert Syst Appl 42:5356–5365

    Article  Google Scholar 

  26. Mangal S, Chaurasia A, Khajanchi A (2020) Convolution Neural Networks for diagnosing colon and lung cancer histopathological images. arXiv preprint arXiv:2009.03878.

  27. Ali M, Ali R (2021) Multi-input dual-stream capsule network for improved lung and colon cancer classification. Diagnostics 11:. https://doi.org/10.3390/diagnostics11081485

  28. Kumar N, Sharma M, Singh VP et al (2022) An empirical study of handcrafted and dense feature extraction techniques for lung and colon cancer classification from histopathological images. Biomed Signal Process Control 75:. https://doi.org/10.1016/j.bspc.2022.103596

  29. Yahia Ibrahim N, Talaat AS (2022) An Enhancement Technique to Diagnose Colon and Lung Cancer by using Double CLAHE and Deep Learning.  Int J Adv Comput Sci Appl (IJACSA) 13. https://doi.org/10.14569/IJACSA.2022.0130833

  30. Fan J, Lee J, Lee Y (2021) A transfer learning architecture based on a support vector machine for histopathology image classification. Appl Sci (Switzerland) 11:6380. https://doi.org/10.3390/app11146380

    Article  CAS  Google Scholar 

  31. Talukder MdA, Islam MdM, Uddin MA et al (2022) Machine learning-based lung and colon cancer detection using deep feature extraction and ensemble learning. Expert Systems with Applications 205:117695

    Article  Google Scholar 

  32. Hage Chehade A, Abdallah N, Marion JM et al (2022) Lung and colon cancer classification using medical imaging: a feature engineering approach. Phys Eng Sci Med 45:729–746. https://doi.org/10.1007/s13246-022-01139-x

    Article  PubMed  Google Scholar 

  33. Aitazaz T, Tubaishat A, Al-Obeidat F et al (2023) Transfer learning for histopathology images: an empirical study. Neural Comput Appl 35:7963–7974. https://doi.org/10.1007/s00521-022-07516-7

    Article  Google Scholar 

  34. Saric M, Russo M, Stella M (2019) CNN-based method for lung cancer detection in whole slide histopathology images. In: 2019 4th International Conference on Smart and Sustainable Technologies (SpliTech). IEEE, pp 1–4

    Google Scholar 

  35. Tjoa E, Guan C (2021) A survey on Explainable Artificial Intelligence (XAI): toward medical XAI. IEEE Trans Neural Netw Learn Syst 32:4793–4813. https://doi.org/10.1109/TNNLS.2020.3027314

    Article  PubMed  Google Scholar 

  36. Samek W, Wiegand T, Müller KR (2017) Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296

    Google Scholar 

  37. O’Mahony N, Campbell S, Carvalho A et al (2020) Deep Learning vs. Traditional Computer Vision. In: Advances in Computer Vision: Proceedings of the 2019 Computer Vision Conference (CVC). Springer, pp 128–144

    Google Scholar 

  38. Borkowski AA, Bui MM, Brannon Thomas L et al (2019) Lung and Colon Cancer Histopathological Image Dataset (LC25000). arXiv preprint arXiv:1912.12142

    Google Scholar 

  39. Swain MJ, Ballard DH (1992) Indexing via color histograms. In: Active perception and robot vision. Springer, pp 261–273

    Chapter  Google Scholar 

  40. Ke G, Meng Q, Finey T et al (2017) LightGBM: A highly efficient gradient boosting decision tree. Adv Neural Inf Process 30

  41. Zachary J, Iyengar SS, Barhen J (2001) Content based image retrieval and information theory: a general approach. J Am Soc Inform Sci Technol 52:840–852. https://doi.org/10.1002/asi.1138

    Article  Google Scholar 

  42. Nagarajan G, Minu RI, Muthukumar B et al (2016) Hybrid genetic algorithm for medical image feature extraction and selection. Procedia Comput Sci 85:455–462. https://doi.org/10.1016/j.procs.2016.05.192

    Article  Google Scholar 

  43. Aoulalay A, El Makhfi N, Abounaima MC, Massar M (2020) Classification of Moroccan decorative patterns based on machine learning algorithms. In: 2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS). IEEE, pp 1–7

    Google Scholar 

  44. Alamdar F, Keyvanpour MR (2011) A new color feature extraction method based on QuadHistogram. Procedia Environ Sci 10:777–783. https://doi.org/10.1016/j.proenv.2011.09.126

    Article  Google Scholar 

  45. Yue J, Li Z, Liu L, Fu Z (2011) Content-based image retrieval using color and texture fused features. Math Comput Model 54:1121–1127. https://doi.org/10.1016/j.mcm.2010.11.044

    Article  Google Scholar 

  46. Ruela M, Barata C, Mendonça T, Marques JS (2013) What is the role of color in dermoscopy analysis?  In: Pattern Recognition and Image Analysis: 6th Iberian Conference, IbPRIA 2013, Funchal, Madeira, Portugal, June 5-7, 2013. Proceedings, vol 6. Springer, pp 819–826

    Google Scholar 

  47. Liang J, Bu Y, Tan K et al (2022) Estimation of stellar atmospheric parameters with light gradient boosting machine algorithm and principal component analysis. Astron J 163:153. https://doi.org/10.3847/1538-3881/ac4d97

    Article  ADS  Google Scholar 

  48. Alzamzami F, Hoda M, El SA (2020) Light gradient boosting machine for general sentiment classification on short texts: a comparative evaluation. IEEE Access 8:101840–101858. https://doi.org/10.1109/ACCESS.2020.2997330

    Article  Google Scholar 

  49. Ayubkhan SAH, Yap W-S, Morris E, Rawthar MBK (2022) A practical intrusion detection system based on denoising autoencoder and LightGBM classifier with improved detection performance. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-022-04449-w

    Article  Google Scholar 

  50. Zhang J, Mucs D, Norinder U, Svensson F (2019) LightGBM: an effective and scalable algorithm for prediction of chemical toxicity-application to the Tox21 and mutagenicity data sets. J Chem Inf Model. https://doi.org/10.1021/acs.jcim.9b00633

    Article  PubMed  PubMed Central  Google Scholar 

  51. Devroye LP, Wagner TJ (1979) Distribution-free performance bounds for potential function rules. IEEE Trans Inf Theory 25:601–604. https://doi.org/10.1109/TIT.1979.1056087

    Article  MathSciNet  Google Scholar 

  52. Xiong Z, Cui Y, Liu Z et al (2020) Evaluating explorative prediction power of machine learning algorithms for materials discovery using k-fold forward cross-validation. Comput Mater Sci 171:109203. https://doi.org/10.1016/j.commatsci.2019.109203

    Article  CAS  Google Scholar 

  53. Kärkkäinen T (2014) On cross-validation for MLP model evaluation. In: Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, S+ SSPR 2014, Joensuu, Finland, August 20-22, 2014. Proceedings. Springer, pp 291–300

    Google Scholar 

  54. Yadav S, Shukla S (2016) Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. In: 2016 IEEE 6th International conference on advanced computing (IACC). IEEE, pp 78–83

    Chapter  Google Scholar 

  55. Berrar D (2019) Cross-Validation. In: Ranganathan Shoba, Gribskov Michael, Nakai Kenta, Schönbach Christian (eds) Encyclopedia of Bioinformatics and Computational Biology. Academic Press, Oxford, pp 542–545

    Chapter  Google Scholar 

  56. Arjaria SK, Rathore AS, Cherian JS (2021) Kidney disease prediction using a machine learning approach: A comparative and comprehensive analysis. In: Demystifying big data, machine learning, and deep learning for healthcare analytics. Academic Press, pp 307–333

    Chapter  Google Scholar 

  57. Shafiullah M, Abido MA, Al-Mohammed AH (2022) Chapter 11 - Smart grid fault diag-nosis under load and renewable energy uncertainty. In: Power System Fault Diagnosis. Elsevier, pp 293–346

    Chapter  Google Scholar 

  58. Toğaçar M (2021) Disease type detection in lung and colon cancer images using the complement approach of inefficient sets. Comput Biol Med 137:104827. https://doi.org/10.1016/j.compbiomed.2021.104827

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Indu Chhillar.

Ethics declarations

Ethics approval

This study uses public databases cited in the references and therefore this section is not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chhillar, I., Singh, A. A feature engineering-based machine learning technique to detect and classify lung and colon cancer from histopathological images. Med Biol Eng Comput 62, 913–924 (2024). https://doi.org/10.1007/s11517-023-02984-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-023-02984-y

Keywords

Navigation