Skip to main content
Log in

Effect of sample size on prognostic genes analysis in non-small cell lung cancer

  • Original Article
  • Published:
Molecular Genetics and Genomics Aims and scope Submit manuscript

Abstract

The identification of prognostic genes can help in the clinical management of non-small cell lung cancer (NSCLC). However, there is little overlap in the prognostic genes identified in different NSCLC studies. One reason for this may be the inadequate sample size. Here, the effect of sample size on prognostic genes analysis was investigated based on 515 stage II/III NSCLC cases from two cohorts detected by whole-exome sequencing. Prognostic genes analysis was repeatedly performed 100 times for each sample size level using random resampling methods. In stage II lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) cases from the TCGA Pan-Lung Cancer cohort, the number of statistically significant prognostic genes first increased with sample size in a power law, then fluctuated steadily, and finally decreased slightly. The power law growth curves were also observed in stage III LUAD and LUSC cases from the TCGA Pan-Lung Cancer cohort and stage III Chinese LUAD cases from the OncoSG cohort. The correlation R2 of the fitted power law growth curves were all greater than 0.99. In addition, at the sample size level where the number of prognostic genes peaked, the mean proportion of true prognostic genes in patients with stage II LUAD and LUSC was 28.32% and 23.12%, which could partly explain the little overlap in prognostic genes between reports. In conclusion, the number of prognostic genes takes a power law growth with the sample size in NSCLC, independent of histopathological subtype, race, and stage. These results also show how sample size affects the reliability of prognostic genes and will aid trial design for genomic mutation-based prognostic studies in NSCLC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

All data generated or analyzed during this study are included in this published article.

References

  • Brenton JD, Carey LA, Ahmed AA, Caldas C (2005) Molecular classification and molecular forecasting of breast cancer: ready for clinical application? J Clin Oncol 23:7350–7360

    Article  CAS  PubMed  Google Scholar 

  • Campbell JD, Alexandrov A, Kim J, Wala J, Berger AH, Pedamallu CS, Shukla SA, Guo G, Brooks AN, Murray BA, Imielinski M, Hu X, Ling S, Akbani R, Rosenberg M, Cibulskis C, Ramachandran A, Collisson EA, Kwiatkowski DJ, Lawrence MS, Weinstein JN, Verhaak RG, Wu CJ, Hammerman PS, Cherniack AD, Getz G, Artyomov MN, Schreiber R, Govindan R, Meyerson M (2016) Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas. Nat Genet 48:607–616

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Caso R, Sanchez-Vega F, Tan KS, Mastrogiacomo B, Zhou J, Jones GD, Nguyen B, Schultz N, Connolly JG, Brandt WS, Bott MJ, Rocco G, Molena D, Isbell JM, Liu Y, Mayo MW, Adusumilli PS, Travis WD, Jones DR (2020) The underlying tumor genomics of predominant histologic subtypes in lung adenocarcinoma. J Thorac Oncol 15:1844–1856

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chen J, Yang H, Teo ASM, Amer LB, Sherbaf FG, Tan CQ, Alvarez JJS, Lu B, Lim JQ, Takano A, Nahar R, Lee YY, Phua CZJ, Chua KP, Suteja L, Chen PJ, Chang MM, Koh TPT, Ong BH, Anantham D, Hsu AAL, Gogna A, Too CW, Aung ZW, Lee YF, Wang L, Lim TKH, Wilm A, Choi PS, Ng PY, Toh CK, Lim WT, Ma S, Lim B, Liu J, Tam WL, Skanderup AJ, Yeong JPS, Tan EH, Creasy CL, Tan DSW, Hillmer AM, Zhai W (2020) Genomic landscape of lung adenocarcinoma in East Asians. Nat Genet 52:177–186

    Article  CAS  PubMed  Google Scholar 

  • Ein-Dor L, Kela I, Getz G, Givol D, Domany E (2005) Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21:171–178

    Article  CAS  PubMed  Google Scholar 

  • Ein-Dor L, Zuk O, Domany E (2006) Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci U S A 103:5923–5928

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hsieh FY, Lavori PW (2000) Sample-size calculations for the Cox proportional hazards regression model with nonbinary covariates. Control Clin Trials 21:552–560

    Article  CAS  PubMed  Google Scholar 

  • In J, Lee DK (2019) Survival analysis: part II - applied clinical data analysis. Korean J Anesthesiol 72:441–457

    Article  PubMed  PubMed Central  Google Scholar 

  • Jiang Y, Huang Y, Du Y, Zhao Y, Ren J, Ma S, Wu C (2017) Identification of prognostic genes and pathways in lung adenocarcinoma using a bayesian approach. Cancer Inform 16:1176935116684825

    Article  PubMed  Google Scholar 

  • Lønning PE, Sørlie T, Børresen-Dale AL (2005) Genomics in breast cancer-therapeutic implications. Nat Clin Pract Oncol 2:26–33

    Article  PubMed  Google Scholar 

  • Mayakonda A, Lin DC, Assenov Y, Plass C, Koeffler HP (2018) Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res 28:1747–1756

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Meng F, Zhang L, Ren Y, Ma Q (2019) The genomic alterations of lung adenocarcinoma and lung squamous cell carcinoma can explain the differences of their overall survival rates. J Cell Physiol 234:10918–10925

    Article  CAS  PubMed  Google Scholar 

  • Moons KG, Royston P, Vergouwe Y, Grobbee DE, Altman DG (2009) Prognosis and prognostic research: what, why, and how? BMJ 338:b375

    Article  PubMed  Google Scholar 

  • Nicholson AG, Tsao MS, Beasley MB, Borczuk AC, Brambilla E, Cooper WA, Dacic S, Jain D, Kerr KM, Lantuejoul S, Noguchi M, Papotti M, Rekhtman N, Scagliotti G, van Schil P, Sholl L, Yatabe Y, Yoshida A, Travis WD (2022) The 2021 WHO classification of lung tumors: impact of advances since 2015. J Thorac Oncol 17:362–387

    Article  PubMed  Google Scholar 

  • Riley RD, Snell KI, Ensor J, Burke DL, Harrell FE Jr, Moons KG, Collins GS (2019) Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Stat Med 38:1276–1296

    Article  PubMed  Google Scholar 

  • Schober P, Vetter TR (2018) Survival Analysis and Interpretation of Time-to-Event Data: The Tortoise and the Hare. Anesth Analg 127:792–798

    Article  PubMed  PubMed Central  Google Scholar 

  • Schoenfeld DA (1983) Sample-size formula for the proportional-hazards regression model. Biometrics 39:499–503

    Article  CAS  PubMed  Google Scholar 

  • Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F (2021) Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 71:209–249

    Article  PubMed  Google Scholar 

  • Wang R, Zhang Y, Pan Y, Li Y, Hu H, Cai D, Li H, Ye T, Luo X, Zhang Y, Li B, Shen L, Sun Y, Chen H (2015) Comprehensive investigation of oncogenic driver mutations in Chinese non-small cell lung cancer patients. Oncotarget 6:34300–34308

    Article  PubMed  PubMed Central  Google Scholar 

  • Xia C, Dong X, Li H, Cao M, Sun D, He S, Yang F, Yan X, Zhang S, Li N, Chen W (2022) Cancer statistics in China and United States, 2022: profiles, trends, and determinants. Chin Med J (engl) 135:584–590

    Article  PubMed  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

PL and YL designed the study and reviewed the manuscript. PL, HL, and ZW performed the data collection and statistical analysis. PL and ZW prepared the figures and drafted the manuscript. All the authors revised the manuscript and approved the final version.

Corresponding author

Correspondence to Yanan Lu.

Ethics declarations

Conflict of interests

The authors declare that they have no competing interests.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by Shuhua Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (XLSX 13 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, P., Li, H., Wan, Z. et al. Effect of sample size on prognostic genes analysis in non-small cell lung cancer. Mol Genet Genomics 298, 549–554 (2023). https://doi.org/10.1007/s00438-023-01999-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00438-023-01999-2

Keywords

Navigation