A feature engineering-based machine learning technique to detect and classify lung and colon cancer from histopathological images

Chhillar, Indu; Singh, Ajmer

doi:10.1007/s11517-023-02984-y

A feature engineering-based machine learning technique to detect and classify lung and colon cancer from histopathological images

Original Article
Published: 13 December 2023

Volume 62, pages 913–924, (2024)
Cite this article

Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Indu Chhillar¹ &
Ajmer Singh¹

366 Accesses
Explore all metrics

Abstract

Globally, lung and colon cancers are among the most prevalent and lethal tumors. Early cancer identification is essential to increase the likelihood of survival. Histopathological images are considered an appropriate tool for diagnosing cancer, which is tedious and error-prone if done manually. Recently, machine learning methods based on feature engineering have gained prominence in automatic histopathological image classification. Furthermore, these methods are more interpretable than deep learning, which operates in a “black box” manner. In the medical profession, the interpretability of a technique is critical to gaining the trust of end users to adopt it. In view of the above, this work aims to create an accurate and interpretable machine-learning technique for the automated classification of lung and colon cancers from histopathology images. In the proposed approach, following the preprocessing steps, texture and color features are retrieved by utilizing the Haralick and Color histogram feature extraction algorithms, respectively. The obtained features are concatenated to form a single feature set. The three feature sets (texture, color, and combined features) are passed into the Light Gradient Boosting Machine (LightGBM) classifier for classification. And their performance is evaluated on the LC25000 dataset using hold-out and stratified 10-fold cross-validation (Stratified 10-FCV) techniques. With a test/hold-out set, the LightGBM with texture, color, and combined features classifies the lung and colon cancer images with 97.72%, 99.92%, and 100% accuracy respectively. In addition, a stratified 10-fold cross-validation method also revealed that LightGBM’s combined or color features performed well, with an excellent mean auc_mu score and a low mean multi_logloss value. Thus, this proposed technique can help histologists detect and classify lung and colon histopathology images more efficiently, effectively, and economically, resulting in more productivity.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lung and colon cancer classification using medical imaging: a feature engineering approach

Article 07 June 2022

Color-SIFT Features for Histopathological Image Analysis

An Integrated Multi-scale Model for Breast Cancer Histopathological Image Classification with Joint Colour-Texture Features

Data availability

The datasets used during the current study are publicly available, [Available from: https://github.com/tampapath/lung_colon_image_set].

References

Adu K, Yu Y, Cai J et al (2021) DHS-CapsNet: dual horizontal squash capsule networks for lung and colon cancer classification from whole slide histopathological images. Int J Imaging Syst Technol 31:2075–2092. https://doi.org/10.1002/ima.22569
Article Google Scholar
Mehmood S, Ghazal TM, Khan MA et al (2022) Malignancy detection in lung and colon histopathology images using transfer learning with class selective image processing. IEEE Access 10:25657–25668. https://doi.org/10.1109/ACCESS.2022.3150924
Article Google Scholar
Lung Cancer Statistics | How common is lung cancer? https://www.cancer.org/cancer/lung-cancer/about/key-statistics.html. Accessed 11 Jan 2023
Attallah O, Aslan MF, Sabanci K (2022) A framework for lung and colon cancer diagnosis via lightweight deep learning models and transformation methods. Diagnostics 12:2926. https://doi.org/10.3390/diagnostics12122926
Article PubMed PubMed Central Google Scholar
Kurishima K, Miyazaki K, Watanabe H et al (2017) Lung cancer patients with synchronous colon cancer. Mol Clin Oncol 8:137–140. https://doi.org/10.3892/mco.2017.1471
Article PubMed PubMed Central Google Scholar
Marquette CH, Copin MC, Wallet F et al (1995) Diagnostic tests for pneumonia in ventilated patients: prospective evaluation of diagnostic accuracy using histology as a diagnostic gold standard. Am J Respir Crit Care Med 151:. https://doi.org/10.1164/ajrccm.151.6.7767535
Masud M, Sikder N, Nahid A et al (2021) A machine learning approach to diagnosing lung and colon cancer using a deep learning‐based classification framework. Sensors (Switzerland) 21:. https://doi.org/10.3390/s21030748
Jena B, Saxena S, Nayak GK et al (2021) Artificial intelligence-based hybrid deep learning models for image classification: the first narrative review. Comput Biol Med 137:104803
Article PubMed Google Scholar
Hosny A, Parmar C, Quackenbush J et al (2018) Artificial intelligence in radiology. Nat Rev Cancer 18:500–510
Article CAS PubMed PubMed Central Google Scholar
Gao Y, Geras KJ, Lewin AA, Moy L (2019) New frontiers: an update on computer-aided diagnosis for breast imaging in the age of artificial intelligence. Am J Roentgenol 212:300
Article Google Scholar
Bianconi F, Fernández A (2007) Evaluation of the effects of Gabor filter parameters on texture classification. Pattern Recognit 40:3325–3335. https://doi.org/10.1016/j.patcog.2007.04.023
Article ADS Google Scholar
Pang Y, Yan H, Yuan Y, Wang K (2012) Robust CoHOG feature extraction in human-centered image/video management system. IEEE Trans Syst Man Cybern B Cybern 42:458–468. https://doi.org/10.1109/TSMCB.2011.2167750
Article PubMed Google Scholar
Haralick RM, Dinstein I, Shanmugam K (1973) Textural features for image classification. IEEE Trans Syst Man Cybern SMC-3:610–621. https://doi.org/10.1109/TSMC.1973.4309314
Article Google Scholar
Ojala T, Pietikäinen M, Mäenpää T (2001) A generalized local binary pattern operator for multiresolution gray scale and rotation invariant texture classification. Advances in Pattern Recognition—ICAPR 2001: Second International Conference Rio de Janeiro, Brazil, March 11–14, 2001 Proceedings 2. Springer, Berlin Heidelberg, pp 399–408
Chapter Google Scholar
Choudhury A, Gupta D (2019) A Survey on Medical Diagnosis of Diabetes Using Machine Learning Techniques. In: Recent Developments in Machine Learning and Data Analytics: IC3 2018. Springer, Singapore, pp 67–78
Chapter Google Scholar
Schmidhuber J (2015) Deep Learning in neural networks: an overview. Neural Netw 61:85–117
Article PubMed Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
Anwar SM, Majid M, Qayyum A et al (2018) Medical image analysis using convolutional neural networks: a review. J Med Syst 42:1–13
Article Google Scholar
Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9:611–629
Article PubMed PubMed Central Google Scholar
Yang R, Yu Y (2021) Artificial convolutional neural network in object detection and semantic segmentation for medical imaging analysis. Front Oncol 11:638182
Article PubMed PubMed Central Google Scholar
Zhai J, Zhang S, Chen J, He Q (2018) Autoencoder and Its Various Variants. In: 2018 IEEE International conference on systems, man, and cybernetics (SMC). IEEE, pp 415–419
Chapter Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M et al (2020) Generative adversarial networks. Commun ACM 63:139–144. https://doi.org/10.1145/3422622
Article Google Scholar
Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211. https://doi.org/10.1016/0364-0213(90)90002-E
Article Google Scholar
Creswell A, Bharath AA (2019) Denoising adversarial autoencoders. IEEE Trans Neural Netw Learn Syst 30:968–984. https://doi.org/10.1109/TNNLS.2018.2852738
Article MathSciNet PubMed Google Scholar
Lee H, Chen YPP (2015) Image based computer aided diagnosis system for cancer detection. Expert Syst Appl 42:5356–5365
Article Google Scholar
Mangal S, Chaurasia A, Khajanchi A (2020) Convolution Neural Networks for diagnosing colon and lung cancer histopathological images. arXiv preprint arXiv:2009.03878.
Ali M, Ali R (2021) Multi-input dual-stream capsule network for improved lung and colon cancer classification. Diagnostics 11:. https://doi.org/10.3390/diagnostics11081485
Kumar N, Sharma M, Singh VP et al (2022) An empirical study of handcrafted and dense feature extraction techniques for lung and colon cancer classification from histopathological images. Biomed Signal Process Control 75:. https://doi.org/10.1016/j.bspc.2022.103596
Yahia Ibrahim N, Talaat AS (2022) An Enhancement Technique to Diagnose Colon and Lung Cancer by using Double CLAHE and Deep Learning. Int J Adv Comput Sci Appl (IJACSA) 13. https://doi.org/10.14569/IJACSA.2022.0130833
Fan J, Lee J, Lee Y (2021) A transfer learning architecture based on a support vector machine for histopathology image classification. Appl Sci (Switzerland) 11:6380. https://doi.org/10.3390/app11146380
Article CAS Google Scholar
Talukder MdA, Islam MdM, Uddin MA et al (2022) Machine learning-based lung and colon cancer detection using deep feature extraction and ensemble learning. Expert Systems with Applications 205:117695
Article Google Scholar
Hage Chehade A, Abdallah N, Marion JM et al (2022) Lung and colon cancer classification using medical imaging: a feature engineering approach. Phys Eng Sci Med 45:729–746. https://doi.org/10.1007/s13246-022-01139-x
Article PubMed Google Scholar
Aitazaz T, Tubaishat A, Al-Obeidat F et al (2023) Transfer learning for histopathology images: an empirical study. Neural Comput Appl 35:7963–7974. https://doi.org/10.1007/s00521-022-07516-7
Article Google Scholar
Saric M, Russo M, Stella M (2019) CNN-based method for lung cancer detection in whole slide histopathology images. In: 2019 4th International Conference on Smart and Sustainable Technologies (SpliTech). IEEE, pp 1–4
Google Scholar
Tjoa E, Guan C (2021) A survey on Explainable Artificial Intelligence (XAI): toward medical XAI. IEEE Trans Neural Netw Learn Syst 32:4793–4813. https://doi.org/10.1109/TNNLS.2020.3027314
Article PubMed Google Scholar
Samek W, Wiegand T, Müller KR (2017) Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296
Google Scholar
O’Mahony N, Campbell S, Carvalho A et al (2020) Deep Learning vs. Traditional Computer Vision. In: Advances in Computer Vision: Proceedings of the 2019 Computer Vision Conference (CVC). Springer, pp 128–144
Google Scholar
Borkowski AA, Bui MM, Brannon Thomas L et al (2019) Lung and Colon Cancer Histopathological Image Dataset (LC25000). arXiv preprint arXiv:1912.12142
Google Scholar
Swain MJ, Ballard DH (1992) Indexing via color histograms. In: Active perception and robot vision. Springer, pp 261–273
Chapter Google Scholar
Ke G, Meng Q, Finey T et al (2017) LightGBM: A highly efficient gradient boosting decision tree. Adv Neural Inf Process 30
Zachary J, Iyengar SS, Barhen J (2001) Content based image retrieval and information theory: a general approach. J Am Soc Inform Sci Technol 52:840–852. https://doi.org/10.1002/asi.1138
Article Google Scholar
Nagarajan G, Minu RI, Muthukumar B et al (2016) Hybrid genetic algorithm for medical image feature extraction and selection. Procedia Comput Sci 85:455–462. https://doi.org/10.1016/j.procs.2016.05.192
Article Google Scholar
Aoulalay A, El Makhfi N, Abounaima MC, Massar M (2020) Classification of Moroccan decorative patterns based on machine learning algorithms. In: 2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS). IEEE, pp 1–7
Google Scholar
Alamdar F, Keyvanpour MR (2011) A new color feature extraction method based on QuadHistogram. Procedia Environ Sci 10:777–783. https://doi.org/10.1016/j.proenv.2011.09.126
Article Google Scholar
Yue J, Li Z, Liu L, Fu Z (2011) Content-based image retrieval using color and texture fused features. Math Comput Model 54:1121–1127. https://doi.org/10.1016/j.mcm.2010.11.044
Article Google Scholar
Ruela M, Barata C, Mendonça T, Marques JS (2013) What is the role of color in dermoscopy analysis? In: Pattern Recognition and Image Analysis: 6th Iberian Conference, IbPRIA 2013, Funchal, Madeira, Portugal, June 5-7, 2013. Proceedings, vol 6. Springer, pp 819–826
Google Scholar
Liang J, Bu Y, Tan K et al (2022) Estimation of stellar atmospheric parameters with light gradient boosting machine algorithm and principal component analysis. Astron J 163:153. https://doi.org/10.3847/1538-3881/ac4d97
Article ADS Google Scholar
Alzamzami F, Hoda M, El SA (2020) Light gradient boosting machine for general sentiment classification on short texts: a comparative evaluation. IEEE Access 8:101840–101858. https://doi.org/10.1109/ACCESS.2020.2997330
Article Google Scholar
Ayubkhan SAH, Yap W-S, Morris E, Rawthar MBK (2022) A practical intrusion detection system based on denoising autoencoder and LightGBM classifier with improved detection performance. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-022-04449-w
Article Google Scholar
Zhang J, Mucs D, Norinder U, Svensson F (2019) LightGBM: an effective and scalable algorithm for prediction of chemical toxicity-application to the Tox21 and mutagenicity data sets. J Chem Inf Model. https://doi.org/10.1021/acs.jcim.9b00633
Article PubMed PubMed Central Google Scholar
Devroye LP, Wagner TJ (1979) Distribution-free performance bounds for potential function rules. IEEE Trans Inf Theory 25:601–604. https://doi.org/10.1109/TIT.1979.1056087
Article MathSciNet Google Scholar
Xiong Z, Cui Y, Liu Z et al (2020) Evaluating explorative prediction power of machine learning algorithms for materials discovery using k-fold forward cross-validation. Comput Mater Sci 171:109203. https://doi.org/10.1016/j.commatsci.2019.109203
Article CAS Google Scholar
Kärkkäinen T (2014) On cross-validation for MLP model evaluation. In: Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, S+ SSPR 2014, Joensuu, Finland, August 20-22, 2014. Proceedings. Springer, pp 291–300
Google Scholar
Yadav S, Shukla S (2016) Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. In: 2016 IEEE 6th International conference on advanced computing (IACC). IEEE, pp 78–83
Chapter Google Scholar
Berrar D (2019) Cross-Validation. In: Ranganathan Shoba, Gribskov Michael, Nakai Kenta, Schönbach Christian (eds) Encyclopedia of Bioinformatics and Computational Biology. Academic Press, Oxford, pp 542–545
Chapter Google Scholar
Arjaria SK, Rathore AS, Cherian JS (2021) Kidney disease prediction using a machine learning approach: A comparative and comprehensive analysis. In: Demystifying big data, machine learning, and deep learning for healthcare analytics. Academic Press, pp 307–333
Chapter Google Scholar
Shafiullah M, Abido MA, Al-Mohammed AH (2022) Chapter 11 - Smart grid fault diag-nosis under load and renewable energy uncertainty. In: Power System Fault Diagnosis. Elsevier, pp 293–346
Chapter Google Scholar
Toğaçar M (2021) Disease type detection in lung and colon cancer images using the complement approach of inefficient sets. Comput Biol Med 137:104827. https://doi.org/10.1016/j.compbiomed.2021.104827
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Deenbandhu Chhotu Ram University of Science and Technology, Murthal, Haryana, India
Indu Chhillar & Ajmer Singh

Authors

Indu Chhillar
View author publications
You can also search for this author in PubMed Google Scholar
Ajmer Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Indu Chhillar.

Ethics declarations

Ethics approval

This study uses public databases cited in the references and therefore this section is not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chhillar, I., Singh, A. A feature engineering-based machine learning technique to detect and classify lung and colon cancer from histopathological images. Med Biol Eng Comput 62, 913–924 (2024). https://doi.org/10.1007/s11517-023-02984-y

Download citation

Received: 25 July 2023
Accepted: 29 November 2023
Published: 13 December 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11517-023-02984-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A feature engineering-based machine learning technique to detect and classify lung and colon cancer from histopathological images