Skip to main content
Log in

A Method for Cancer Genomics Feature Selection Based on LASSO-RFE

  • Research Paper
  • Published:
Iranian Journal of Science and Technology, Transactions A: Science Aims and scope Submit manuscript

Abstract

A more efficient feature selection method was developed to screen genes corresponding to specific cancers to further investigate their pathogenesis. The LASSO-RFE model, a last absolute shrinkage and selection operator (LASSO) classifier based on the idea of recursive feature elimination (RFE), was constructed. To verify the efficiency of the proposed algorithm, performance tests were conducted by using four kinds of gene expression RNA sequences publicly available in The Cancer Genome Atlas (TCGA). The numerical experiments were used to illustrate that the proposed LASSO-RFE enables a higher accuracy of the classification prediction model and a clearer biological interpretability of the selected gene features compared with three typical feature selection algorithms. The experimental results showed that LASSO-RFE effectively reduced tens of thousands of features in the original data to three dimensions and provided better performance for the classification model than mutual information, L1-SVM and tree-based selection method. This model retains the ability of the common LASSO algorithm to filter and remove redundant and irrelevant features, and enhances the biological interpretability according to RFE, which was compared with the traditional feature reduction methods. In this paper, only a limited number of data cases have been validated, and the application of LASSO-RFE with more recent data remains to be further investigated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

All data are available from the corresponding author.

References

  • Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 4(37):373–384

    Article  MathSciNet  Google Scholar 

  • Chapman KB, Prendes MJ, Sternberg H et al (2012) COL10A1 expression is elevated in diverse solid tumor types and is associated with tumor vasculature. Future Oncol 8(8):1031–1040

    Article  Google Scholar 

  • Chen J, Zou Q, Li J (2021) DeepM6ASeq-EL: prediction of human N6-methyladenosine (m6A) sites with LSTM and ensemble learning. Front Comput Sci. https://doi.org/10.1007/s11704-020-0180-0

    Article  Google Scholar 

  • Chen K, Liu Y, Wang Z, et al (2019) Expression of COL10A1 in patients with pancreatic cancer and its prognostic value. Acad J Chin PLA Med School

  • Duan L, Ge H, Ma W et al (2015) EEG feature selection method based on decision tree. Bio-Med Mater Eng 26(s1):S1019–S1025

    Article  Google Scholar 

  • Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene monitoring. Science 286(5439):531–537

    Article  Google Scholar 

  • Guyon I, Weston J, Barnhill S et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422

    Article  Google Scholar 

  • Guyon I, Nikravesh M, Gunn S, et al (2006) [Studies in fuzziness and soft computing] feature extraction Volume 207|| Combining SVMs with various feature selection strategies, 315-324. https://doi.org/10.1007/978-3-540-35488-8

  • Huang H, Li T, Ye G et al (2018) High expression of COL10A1 is associated with poor prognosis in colorectal cancer. Onco Targets Ther 11:1571–1581

    Article  Google Scholar 

  • Li J, Qin Y, Yi D et al (2015) Feature selection for support vector machine in the study of financial early warning system. Qual Reliab Eng 30(6):867–877

    Article  Google Scholar 

  • Li Y, Wang X, Shi L et al (2020) Predictions for high COL1A1 and COL10A1 expression resulting in a poor prognosis in esophageal squamous cell carcinoma by bioinformatics analyses. Translat Cancer Res 9(1):85–94

    Article  Google Scholar 

  • Li T, Huang H, Shi G, et al (2018) TGF-β1-SOX9 axis-inducible COL10A1 promotes invasion and metastasis in gastric cancer via epithelial-to-mesenchymal transition. Cell Death and Disease

  • Maes F, Collignon A (1997) Multimodality image registration by maximization of mutual information. IEEE Trans Med Imaging 16(2):187–198

    Article  Google Scholar 

  • Molina LC, Belanche L, Nebot N (2002) Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM 2002), 9–12 Dec 2002, Maebashi City, Japan. IEEE.

  • Necula L, Matei L, Dragu D et al (2020) High plasma levels of COL10A1 are associated with advanced tumor stage in gastric cancer patients. World J Gastroenterol 26(22):3024–3033

    Article  Google Scholar 

  • Peng Y, Wu Z, Jiang J (2010) A novel feature selection approach for biomedical data classification. J Biomed Inform 43(1):15–23

    Article  Google Scholar 

  • Ramaswamy S, Golub TR (2002) DNA microarrays in clinical oncology. J Clin Oncol 20(7):1932–1941

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J r Stat Soc Ser B (methodol) 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Tinker AV, Boussioutas A, Bowtell DDL (2006) The challenges of gene expression microarrays for the study of human cancer. Cancer Cell 9:333–339

    Article  Google Scholar 

  • Topouzelis K, Psyllos A (2012) Oil spill feature selection and classification using decision tree forest on SAR image data. Isprs J Photogramm Remote Sens 68:135–143

    Article  Google Scholar 

  • Yang Y, Sun F, Chen H, Tan H, Yang L, Zhang L, Huang Y (2021) Postnatal exposure to DINP was associated with greater alterations of lipidomic markers for hepatic steatosis than DEHP in postweaning mice. Sci Total Environ 758:143631. https://doi.org/10.1016/j.scitotenv.2020.143631

    Article  Google Scholar 

  • Zhang M, Chen H, Wang M, Bai F, Wu K (2020) Bioinformatics analysis of prognostic significance of COL10A1 in breast cancer. Biosci Rep 40(2)

  • Zou Q, Xing P, Wei L, Liu B (2019) Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA. RNA 25(2):205–218. https://doi.org/10.1261/rna.069112.118

    Article  Google Scholar 

Download references

Funding

None.

Author information

Authors and Affiliations

Authors

Contributions

CA contributed to conceptualization, data curation, writing—original draft, and writing—review and editing.

Corresponding author

Correspondence to Chen Ai.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ai, C. A Method for Cancer Genomics Feature Selection Based on LASSO-RFE. Iran J Sci Technol Trans Sci 46, 731–738 (2022). https://doi.org/10.1007/s40995-022-01292-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40995-022-01292-8

Keywords

Navigation