Skip to main content

Advertisement

Log in

Machine-learning-based classification of diffuse large B-cell lymphoma patients by a 7-mRNA signature enriched with immune infiltration and cell cycle

  • RESEARCH ARTICLE
  • Published:
Clinical and Translational Oncology Aims and scope Submit manuscript

Abstract

Background

Diffuse large B-cell lymphoma (DLBCL) exhibits remarkable heterogeneity but still remains undiagnosed in identifying the subpopulation of DLBCL to predict the prognosis and guide clinical treatment.

Methods

Molecular subgroups were identified in gene expression data from GSE10846 by a consensus clustering algorithm. And gene set enrichment analysis, immune infiltration, and the proposed cell cycle algorithm were applied to explore the biological functions of different subtypes. Meanwhile, univariate and multivariate Cox regression analyses were used to evaluate independent prognostic factors of DLBCL. Finally, the prognostic model, including some key genes screened by Lasso regression, Random Forest algorithm, and point-biserial correlation, was constructed by an optimal classifier from seven machine learning algorithms and validated by another three external datasets (GSE34171, GSE87371, GSE31312).

Results

Comprehensive genomic analysis of 1,143 DLBCL samples identify 2 molecularly, prognostically relevant subtypes: immune-enriched (IME) and cell-cycle-enriched (CCE). Then a new predictive model including seven key genes (SERPING1, TIMP2, NME1, DCTPP1, RFC4, POLE2, and SNRPD1) was developed with high prediction accuracy (88.6%) and strong predictive power (AUC = 0.973) based on the Support Vector Machine (SVM) algorithm in 414 patients from GSE10846. The predictive power was similar in another three testing sets (HR > 1.400, p < 0.05).

Conclusion

This model could evaluate survival independently with strong predictive power compared with other clinical risk factors. Our study constructed a reliable model to predict two new subtypes of DLBCL patients, which could guide the implementation of individualized treatment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The dataset generated during the current study is available in the GEO (http://www.ncbi.nlm.nih.gov/geo/) repository with accession codes GSE10846, GSE34171, GSE87371, GSE31312. The code to calculate the relative proportion of cells in each cell cycle phase for samples based on gene expression profile is available at https://doi.org/10.5281/zenodo.6613589 (the most recent version as well as the archived version referenced in the study).

Abbreviations

DLBCL:

Diffuse large B-cell lymphoma

DEGs:

Differentially expressed genes

IME:

Immune-enriched

CCE:

Cell-cycle-enriched

CHOP:

Cyclophosphamide, doxorubicin, vincristine, and prednisone

RCHOP:

Rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone

MCP-counter:

Microenvironment cell populations-counter

EPIC:

Estimating the proportion of immune and cancer cells

LDH:

Lactate dehydrogenase

GCB:

Germinal center B-cell-like

ABC:

Activated B-cell-like

IPI:

International Prognostic Index

GEO:

Gene Expression Omnibus

GSEA:

Gene set enrichment analysis

log2FC:

Log2 fold change

adj.p :

Adjust p value

EMT:

Epithelial to mesenchymal transition

MSigDB:

Molecular Signatures Database

FDR:

False discovery rate

BH:

Benjamin–Hochberg

AUC:

Area under the curve

ROC:

Receiver operating characteristic analysis

SVM:

Support Vector Machine

DT:

Decision Tree

NB:

Naive Bayesian

RF:

Random Forest

References

  1. Reddy A, Zhang J, Davis NS, Moffitt AB, Love CL, Waldrop A, et al. Genetic and functional drivers of diffuse large B cell lymphoma. Cell. 2017;171:481–94.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Jiménez-Cortegana C, Sánchez-Martínez PM, Palazón-Carrión N, Nogales-Fernández E, Henao-Carrasco F, Martín García-Sancho A, et al. Lower survival and increased circulating suppressor cells in patients with relapsed/refractory diffuse large B-cell lymphoma with deficit of vitamin D levels using R-GDP plus lenalidomide (R2-GDP): results from the R2-GDP-GOTEL trial. Cancers (Basel). 2021;13:4622.

    Article  PubMed  Google Scholar 

  3. Pan M, Yang P, Wang F, Luo X, Li B, Ding Y, et al. Whole Transcriptome data analysis reveals prognostic signature genes for overall survival prediction in diffuse large B cell lymphoma. Front Genet. 2021;12:963.

    Article  Google Scholar 

  4. Luo Y. A novel molecular classification of diffuse large B cell lymphoma based on Metabolism-related genes. 2020. https://doi.org/10.21203/rs.3.rs-132445/v1

  5. Reddy A, Zhang J, Davis NS, Moffitt AB, Love CL, Waldrop A, et al. Genetic and functional drivers of diffuse large B cell lymphoma. Cell. 2017;171(481–94): e15.

    Google Scholar 

  6. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–11.

    Article  CAS  PubMed  Google Scholar 

  7. Younes A. Promising novel agents for aggressive B-cell lymphoma. Hematol Oncol Clin N Am. 2016;30:1229–37.

    Article  Google Scholar 

  8. Sun J, Zhu X, Zhao Y, Zhou Q, Qi R, Liu H. CHN1 is a novel prognostic marker for diffuse large B-cell lymphoma. Pharmacogn Pers Med. 2021;14:397.

    Google Scholar 

  9. Morin RD, Arthur SE, Hodson DJ. Molecular profiling in diffuse large B-cell lymphoma: why so many types of subtypes? Br J Haematol. 2022;196:814–29.

    Article  CAS  PubMed  Google Scholar 

  10. Gautier L, Cope L, Bolstad BM, Irizarry RA. affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004;20:307–15.

    Article  CAS  PubMed  Google Scholar 

  11. Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010;26:1572–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci. 2005;102:15545–50.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27:1739–40.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics J Integr Biol. 2012;16:284–7.

    Article  CAS  Google Scholar 

  16. Therneau TM, Lumley T. Package ‘survival.’ R Top Doc. 2015;128:28–33.

    Google Scholar 

  17. Becht E, Giraldo NA, Lacroix L, Buttard B, Elarouci N, Petitprez F, et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17:1–20.

    Google Scholar 

  18. Racle J, Gfeller D. EPIC: a tool to estimate the proportions of different cell types from bulk gene expression data. Bioinformatics for cancer immunotherapy. Berlin: Springer; 2020. p. 233–48.

    Google Scholar 

  19. Liu Z, Lou H, Xie K, Wang H, Chen N, Aparicio OM, et al. Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nat Commun. 2017;8:1–9.

    Google Scholar 

  20. Hao Y, Hao S, Andersen-Nissen E, Mauck WM III, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(3573): 87.e29.

    Google Scholar 

  21. Therneau T, Atkinson B, Ripley B, Ripley MB. Package ‘rpart’. 2015. cranmaicacuk/web/packages/rpart/rpart pdf. Accessed 20 April 2016

  22. Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, et al. Xgboost: extreme gradient boosting. R package version 04-2. 2015;1:1-4.

  23. Alfaro E, Gamez M, Garcia N. adabag: an R package for classification with boosting and bagging. J Stat Softw. 2013;54:1–35.

    Article  Google Scholar 

  24. Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F, Chang C, et al. e1071: Misc functions of the Department of Statistics (e1071), TU Wien. R package version. 2014;1.

  25. Ripley B, Venables B, Bates DM, Hornik K, Gebhardt A, Firth D, et al. Package ‘mass.’ Cran r. 2013;538:113–20.

    Google Scholar 

  26. RColorBrewer S, Liaw MA. Package ‘randomforest’. University of California Berkeley: Berkeley. 2018.

  27. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 2011;12:1–8.

    Article  Google Scholar 

  28. Yang J, Anholts J, Kolbe U, Stegehuis-Kamp JA, Claas FH, Eikmans M. Calcium-binding proteins S100A8 and S100A9: investigation of their immune regulatory effect in myeloid cells. Int J Mol Sci. 2018;19:1833.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Hutloff A, Dittrich AM, Beier KC, Eljaschewitsch B, Kraft R, Anagnostopoulos I, et al. ICOS is an inducible T-cell co-stimulator structurally and functionally related to CD28. Nature. 1999;397:263–6.

    Article  CAS  PubMed  Google Scholar 

  30. Xu D, Liu X, Wang Y, Zhou K, Wu J, Chen J, et al. Identification of immune subtypes and prognosis of hepatocellular carcinoma based on immune checkpoint gene expression profile. Biomed Pharmacother. 2020;126:109903.

    Article  CAS  PubMed  Google Scholar 

  31. Alderdice M, Craig SG, Humphries MP, Gilmore A, Johnston N, Bingham V, et al. Evolutionary genetic algorithm identifies IL2RB as a potential predictive biomarker for immune-checkpoint therapy in colorectal cancer. NAR Genom Bioinform. 2021;3:lqab016.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Khawar B, Abbasi MH, Sheikh N. A panoramic spectrum of complex interplay between the immune system and IL-32 during pathogenesis of various systemic infections and inflammation. Eur J Med Res. 2015;20:1–8.

    Article  Google Scholar 

  33. Yagi R, Zhong C, Northrup DL, Yu F, Bouladoux N, Spencer S, et al. The transcription factor GATA3 is critical for the development of all IL-7Rα-expressing innate lymphoid cells. Immunity. 2014;40:378–88.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Bertoli C, Skotheim JM, De Bruin RA. Control of cell cycle transcription during G1 and S phases. Nat Rev Mol Cell Biol. 2013;14:518–28.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Ayllon V, O’connor R. PBK/TOPK promotes tumour cell proliferation through p38 MAPK activity and regulation of the DNA damage response. Oncogene. 2007;26:3451–61.

    Article  CAS  PubMed  Google Scholar 

  36. Chaves-Perez A, Mack B, Maetzel D, Kremling H, Eggert C, Harreus U, et al. EpCAM regulates cell cycle progression via control of cyclin D1 expression. Oncogene. 2013;32:641–50.

    Article  CAS  PubMed  Google Scholar 

  37. Diril MK, Ratnacaram CK, Padmakumar V, Du T, Wasser M, Coppola V, et al. Cyclin-dependent kinase 1 (Cdk1) is essential for cell division and suppression of DNA re-replication but not for liver regeneration. Proc Natl Acad Sci USA. 2012;109:3826–31.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Caldon CE, Sergio CM, Kang J, Muthukaruppan A, Boersma MN, Stone A, et al. Cyclin E2 overexpression is associated with endocrine resistance but not insensitivity to CDK2 inhibition in human breast cancer cells. Mol Cancer Ther. 2012;11:1488–99.

    Article  CAS  PubMed  Google Scholar 

  39. Hiraoka K, Miyamoto M, Cho Y, Suzuoki M, Oshikiri T, Nakakubo Y, et al. Concurrent infiltration by CD8+ T cells and CD4+ T cells is a favourable prognostic factor in non-small-cell lung carcinoma. Br J Cancer. 2006;94:275–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Wen Y, Jing Y, Yang L, Kang D, Jiang P, Li N, et al. The regulators of BCR signaling during B cell activation. Blood Sci. 2019;1:119–29.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Zheng SC, Stein-O’Brien G, Augustin JJ, Slosberg J, Carosso GA, Winer B, et al. Universal prediction of cell-cycle position using transfer learning. Genome Biol. 2022;23:1–27.

    Article  Google Scholar 

Download references

Acknowledgements

We acknowledge the National Natural Science Foundation of China (Grant Number: 62076015) for financial support.

Funding

This work was supported by the National Natural Science Foundation of China (Grant number: 62076015).

Author information

Authors and Affiliations

Authors

Contributions

SZ conceived and supervised this study. XZ and BL conducted the major bioinformatics and biostatistics analysis of these data, produced all the figures and tables in this manuscript. JL, HW, JY, XJ, JL, NZ, LL, YC, and ZL assisted with sample collection and manuscript revision. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shuangtao Zhao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

None.

Informed consent

No informed consent is required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Table S1

. Marker gene list of G1, S, and G2M for cell cycle analysis. (XLSX 11 kb)

Table S2

. 1027 DEGs selected by limma analysis between two subgroups from 414 DLBCL samples. (XLSX 117 kb)

Table S3

. Immune infiltration result for 414 DLBCL samples from MCP-counter. (XLSX 52 kb)

Table S4

. Immune infiltration result for 414 DLBCL samples from EPIC. (XLSX 44 kb)

Table S5

. The proportion of cells in G1, S, and G2M cell cycle phases for 414 DLBCL samples. (XLSX 33 kb)

Table S6

. Screening of characteristic genes between IME and CCE groups in 414 DLBCL. (XLSX 16 kb)

Fig. S1

. Consensus clustering for 181 CHOP and 233 RCHOP samples from GSE10846. (a) Consensus clustering matrix of 181 CHOP samples from GSE10846 for k = 2 (cluster 1: n = 71, cluster 2: n = 110). (b) Consensus clustering matrix of 181 CHOP samples from GSE10846 for k = 3 (cluster 1: n = 70, cluster 2: n = 71, cluster 3: n = 40). (c) Consensus clustering CDF of 181 CHOP samples for k = 2 to k = 8. (d) Consensus clustering matrix of 233 RCHOP samples from GSE10846 for k = 2 (cluster 1: n = 105, cluster 2: n = 128). (e) Consensus clustering matrix of 233 RCHOP samples from GSE10846 for k = 3 (cluster 1: n = 79, cluster 2: n = 73, cluster 3: n = 81). (f) Consensus clustering CDF of 233 RCHOP samples for k = 2 to k = 8. (PNG 847 kb)

Fig. S2

. Kaplan–Meier survival analysis of CHOP and RCHOP patients in GSE10846. (a) Kaplan–Meier survival analysis of 181 CHOP samples in cluster 1 and cluster 2. (b) Kaplan–Meier survival analysis of 233 RCHOP samples in cluster 1 and cluster 2. (PNG 408 kb)

Fig. S3

. Screening of key genes using Lasso regression, Random Forest, and Point-biserial correlation. (a) Selection of 45 DEGs in IME using the Lasso regression model via minimum criteria. (b) Selection of 15 DEGs in CCE using the Lasso regression model via minimum criteria. (c) The feature importance of the top 30 key genes from 217 DEGs in IME is indicated by the MeanDecreaseAccuracy value from the Random Forest model. (d) The feature importance of 25 DEGs in CCE was indicated by the MeanDecreaseAccuracy value from the Random Forest model. (e) 41 genes with point-biserial correlation coefficients greater than 0.5 in IME. (f) 6 genes with point-biserial correlation coefficients greater than 0.5 in CCE. (PNG 2450 kb)

Fig. S4

. Differential analysis between IME and CCE in three external validation sets including GSE34171 (a), GSE87371 (b), and GSE31312 (c). (PNG 1095 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhuang, X., Liu, B., Long, J. et al. Machine-learning-based classification of diffuse large B-cell lymphoma patients by a 7-mRNA signature enriched with immune infiltration and cell cycle. Clin Transl Oncol 26, 936–950 (2024). https://doi.org/10.1007/s12094-023-03326-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12094-023-03326-y

Keywords

Navigation