Skip to main content

Advertisement

Log in

Discovery of potential biomarkers for lung cancer classification based on human proteome microarrays using Stochastic Gradient Boosting approach

  • Research
  • Published:
Journal of Cancer Research and Clinical Oncology Aims and scope Submit manuscript

Abstract

Purpose

Early identification of lung cancer (LC) will considerably facilitate the intervention and prevention of LC. The human proteome micro-arrays approach can be used as a “liquid biopsy” to diagnose LC to complement conventional diagnosis, which needs advanced bioinformatics methods such as feature selection (FS) and refined machine learning models.

Methods

A two-stage FS methodology by infusing Pearson’s Correlation (PC) with a univariate filter (SBF) or recursive feature elimination (RFE) was used to reduce the redundancy of the original dataset. The Stochastic Gradient Boosting (SGB), Random Forest (RF), and Support Vector Machine (SVM) techniques were applied to build ensemble classifiers based on four subsets. The synthetic minority oversampling technique (SMOTE) was used in the preprocessing of imbalanced data.

Results

FS approach with SBF and RFE extracted 25 and 55 features, respectively, with 14 overlapped ones. All three ensemble models demonstrate superior accuracy (ranging from 0.867 to 0.967) and sensitivity (0.917 to 1.00) in the test datasets with SGB of SBF subset outperforming others. The SMOTE technique has improved the model performance in the training process. Three of the top selected candidate biomarkers (LGR4, CDC34, and GHRHR) were highly suggested to play a role in lung tumorigenesis.

Conclusion

A novel hybrid FS method with classical ensemble machine learning algorithms was first used in the classification of protein microarray data. The parsimony model constructed by the SGB algorithm with the appropriate FS and SMOTE approach performs well in the classification task with higher sensitivity and specificity. Standardization and innovation of bioinformatics approach for protein microarray analysis need further exploration and validation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Data and code supporting the results or analysis presented in this study were available upon reasonable request from Jianbo Pan and Yazhou Wu.

References

Download references

Funding

This work was supported by the National Natural Science Foundation of China (No. 82173621, 81872716).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study’s conception and design. NY: Conceptualization, Methodology, Software, Formal Analysis, Writing-Original Draft. JP: Methodology, Resources, Writing-Original Draft. XC: Formal analysis, Editing & polishing. PL: Software, Validation, Investigation. YL: Validation, Formal analysis, Visualization. ZW: Software, Editing & polishing. TY: Methodology, Formal Analysis. LQ: Validation, Visualization. DY: Conceptualization, Validation, Methodology. YW: Funding acquisition, Conceptualization, Resources, Methodology, Writing-review & editing, Supervision.

Corresponding authors

Correspondence to Dong Yi or Yazhou Wu.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

The data collection procedure that involved human beings in his study was approved by the Ethics Committee of Fujian Provincial Hospital and conducted in accordance with the Helsinki Declaration. Written informed consent was obtained from the participants of the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yao, N., Pan, J., Chen, X. et al. Discovery of potential biomarkers for lung cancer classification based on human proteome microarrays using Stochastic Gradient Boosting approach. J Cancer Res Clin Oncol 149, 6803–6812 (2023). https://doi.org/10.1007/s00432-023-04643-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00432-023-04643-z

Keywords

Navigation