Abstract
Background
Diffuse large B-cell lymphoma (DLBCL) exhibits remarkable heterogeneity but still remains undiagnosed in identifying the subpopulation of DLBCL to predict the prognosis and guide clinical treatment.
Methods
Molecular subgroups were identified in gene expression data from GSE10846 by a consensus clustering algorithm. And gene set enrichment analysis, immune infiltration, and the proposed cell cycle algorithm were applied to explore the biological functions of different subtypes. Meanwhile, univariate and multivariate Cox regression analyses were used to evaluate independent prognostic factors of DLBCL. Finally, the prognostic model, including some key genes screened by Lasso regression, Random Forest algorithm, and point-biserial correlation, was constructed by an optimal classifier from seven machine learning algorithms and validated by another three external datasets (GSE34171, GSE87371, GSE31312).
Results
Comprehensive genomic analysis of 1,143 DLBCL samples identify 2 molecularly, prognostically relevant subtypes: immune-enriched (IME) and cell-cycle-enriched (CCE). Then a new predictive model including seven key genes (SERPING1, TIMP2, NME1, DCTPP1, RFC4, POLE2, and SNRPD1) was developed with high prediction accuracy (88.6%) and strong predictive power (AUC = 0.973) based on the Support Vector Machine (SVM) algorithm in 414 patients from GSE10846. The predictive power was similar in another three testing sets (HR > 1.400, p < 0.05).
Conclusion
This model could evaluate survival independently with strong predictive power compared with other clinical risk factors. Our study constructed a reliable model to predict two new subtypes of DLBCL patients, which could guide the implementation of individualized treatment.
Similar content being viewed by others
Data availability
The dataset generated during the current study is available in the GEO (http://www.ncbi.nlm.nih.gov/geo/) repository with accession codes GSE10846, GSE34171, GSE87371, GSE31312. The code to calculate the relative proportion of cells in each cell cycle phase for samples based on gene expression profile is available at https://doi.org/10.5281/zenodo.6613589 (the most recent version as well as the archived version referenced in the study).
Abbreviations
- DLBCL:
-
Diffuse large B-cell lymphoma
- DEGs:
-
Differentially expressed genes
- IME:
-
Immune-enriched
- CCE:
-
Cell-cycle-enriched
- CHOP:
-
Cyclophosphamide, doxorubicin, vincristine, and prednisone
- RCHOP:
-
Rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone
- MCP-counter:
-
Microenvironment cell populations-counter
- EPIC:
-
Estimating the proportion of immune and cancer cells
- LDH:
-
Lactate dehydrogenase
- GCB:
-
Germinal center B-cell-like
- ABC:
-
Activated B-cell-like
- IPI:
-
International Prognostic Index
- GEO:
-
Gene Expression Omnibus
- GSEA:
-
Gene set enrichment analysis
- log2FC:
-
Log2 fold change
- adj.p :
-
Adjust p value
- EMT:
-
Epithelial to mesenchymal transition
- MSigDB:
-
Molecular Signatures Database
- FDR:
-
False discovery rate
- BH:
-
Benjamin–Hochberg
- AUC:
-
Area under the curve
- ROC:
-
Receiver operating characteristic analysis
- SVM:
-
Support Vector Machine
- DT:
-
Decision Tree
- NB:
-
Naive Bayesian
- RF:
-
Random Forest
References
Reddy A, Zhang J, Davis NS, Moffitt AB, Love CL, Waldrop A, et al. Genetic and functional drivers of diffuse large B cell lymphoma. Cell. 2017;171:481–94.
Jiménez-Cortegana C, Sánchez-Martínez PM, Palazón-Carrión N, Nogales-Fernández E, Henao-Carrasco F, Martín García-Sancho A, et al. Lower survival and increased circulating suppressor cells in patients with relapsed/refractory diffuse large B-cell lymphoma with deficit of vitamin D levels using R-GDP plus lenalidomide (R2-GDP): results from the R2-GDP-GOTEL trial. Cancers (Basel). 2021;13:4622.
Pan M, Yang P, Wang F, Luo X, Li B, Ding Y, et al. Whole Transcriptome data analysis reveals prognostic signature genes for overall survival prediction in diffuse large B cell lymphoma. Front Genet. 2021;12:963.
Luo Y. A novel molecular classification of diffuse large B cell lymphoma based on Metabolism-related genes. 2020. https://doi.org/10.21203/rs.3.rs-132445/v1
Reddy A, Zhang J, Davis NS, Moffitt AB, Love CL, Waldrop A, et al. Genetic and functional drivers of diffuse large B cell lymphoma. Cell. 2017;171(481–94): e15.
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–11.
Younes A. Promising novel agents for aggressive B-cell lymphoma. Hematol Oncol Clin N Am. 2016;30:1229–37.
Sun J, Zhu X, Zhao Y, Zhou Q, Qi R, Liu H. CHN1 is a novel prognostic marker for diffuse large B-cell lymphoma. Pharmacogn Pers Med. 2021;14:397.
Morin RD, Arthur SE, Hodson DJ. Molecular profiling in diffuse large B-cell lymphoma: why so many types of subtypes? Br J Haematol. 2022;196:814–29.
Gautier L, Cope L, Bolstad BM, Irizarry RA. affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004;20:307–15.
Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010;26:1572–3.
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci. 2005;102:15545–50.
Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27:1739–40.
Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics J Integr Biol. 2012;16:284–7.
Therneau TM, Lumley T. Package ‘survival.’ R Top Doc. 2015;128:28–33.
Becht E, Giraldo NA, Lacroix L, Buttard B, Elarouci N, Petitprez F, et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17:1–20.
Racle J, Gfeller D. EPIC: a tool to estimate the proportions of different cell types from bulk gene expression data. Bioinformatics for cancer immunotherapy. Berlin: Springer; 2020. p. 233–48.
Liu Z, Lou H, Xie K, Wang H, Chen N, Aparicio OM, et al. Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nat Commun. 2017;8:1–9.
Hao Y, Hao S, Andersen-Nissen E, Mauck WM III, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(3573): 87.e29.
Therneau T, Atkinson B, Ripley B, Ripley MB. Package ‘rpart’. 2015. cranmaicacuk/web/packages/rpart/rpart pdf. Accessed 20 April 2016
Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, et al. Xgboost: extreme gradient boosting. R package version 04-2. 2015;1:1-4.
Alfaro E, Gamez M, Garcia N. adabag: an R package for classification with boosting and bagging. J Stat Softw. 2013;54:1–35.
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F, Chang C, et al. e1071: Misc functions of the Department of Statistics (e1071), TU Wien. R package version. 2014;1.
Ripley B, Venables B, Bates DM, Hornik K, Gebhardt A, Firth D, et al. Package ‘mass.’ Cran r. 2013;538:113–20.
RColorBrewer S, Liaw MA. Package ‘randomforest’. University of California Berkeley: Berkeley. 2018.
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 2011;12:1–8.
Yang J, Anholts J, Kolbe U, Stegehuis-Kamp JA, Claas FH, Eikmans M. Calcium-binding proteins S100A8 and S100A9: investigation of their immune regulatory effect in myeloid cells. Int J Mol Sci. 2018;19:1833.
Hutloff A, Dittrich AM, Beier KC, Eljaschewitsch B, Kraft R, Anagnostopoulos I, et al. ICOS is an inducible T-cell co-stimulator structurally and functionally related to CD28. Nature. 1999;397:263–6.
Xu D, Liu X, Wang Y, Zhou K, Wu J, Chen J, et al. Identification of immune subtypes and prognosis of hepatocellular carcinoma based on immune checkpoint gene expression profile. Biomed Pharmacother. 2020;126:109903.
Alderdice M, Craig SG, Humphries MP, Gilmore A, Johnston N, Bingham V, et al. Evolutionary genetic algorithm identifies IL2RB as a potential predictive biomarker for immune-checkpoint therapy in colorectal cancer. NAR Genom Bioinform. 2021;3:lqab016.
Khawar B, Abbasi MH, Sheikh N. A panoramic spectrum of complex interplay between the immune system and IL-32 during pathogenesis of various systemic infections and inflammation. Eur J Med Res. 2015;20:1–8.
Yagi R, Zhong C, Northrup DL, Yu F, Bouladoux N, Spencer S, et al. The transcription factor GATA3 is critical for the development of all IL-7Rα-expressing innate lymphoid cells. Immunity. 2014;40:378–88.
Bertoli C, Skotheim JM, De Bruin RA. Control of cell cycle transcription during G1 and S phases. Nat Rev Mol Cell Biol. 2013;14:518–28.
Ayllon V, O’connor R. PBK/TOPK promotes tumour cell proliferation through p38 MAPK activity and regulation of the DNA damage response. Oncogene. 2007;26:3451–61.
Chaves-Perez A, Mack B, Maetzel D, Kremling H, Eggert C, Harreus U, et al. EpCAM regulates cell cycle progression via control of cyclin D1 expression. Oncogene. 2013;32:641–50.
Diril MK, Ratnacaram CK, Padmakumar V, Du T, Wasser M, Coppola V, et al. Cyclin-dependent kinase 1 (Cdk1) is essential for cell division and suppression of DNA re-replication but not for liver regeneration. Proc Natl Acad Sci USA. 2012;109:3826–31.
Caldon CE, Sergio CM, Kang J, Muthukaruppan A, Boersma MN, Stone A, et al. Cyclin E2 overexpression is associated with endocrine resistance but not insensitivity to CDK2 inhibition in human breast cancer cells. Mol Cancer Ther. 2012;11:1488–99.
Hiraoka K, Miyamoto M, Cho Y, Suzuoki M, Oshikiri T, Nakakubo Y, et al. Concurrent infiltration by CD8+ T cells and CD4+ T cells is a favourable prognostic factor in non-small-cell lung carcinoma. Br J Cancer. 2006;94:275–80.
Wen Y, Jing Y, Yang L, Kang D, Jiang P, Li N, et al. The regulators of BCR signaling during B cell activation. Blood Sci. 2019;1:119–29.
Zheng SC, Stein-O’Brien G, Augustin JJ, Slosberg J, Carosso GA, Winer B, et al. Universal prediction of cell-cycle position using transfer learning. Genome Biol. 2022;23:1–27.
Acknowledgements
We acknowledge the National Natural Science Foundation of China (Grant Number: 62076015) for financial support.
Funding
This work was supported by the National Natural Science Foundation of China (Grant number: 62076015).
Author information
Authors and Affiliations
Contributions
SZ conceived and supervised this study. XZ and BL conducted the major bioinformatics and biostatistics analysis of these data, produced all the figures and tables in this manuscript. JL, HW, JY, XJ, JL, NZ, LL, YC, and ZL assisted with sample collection and manuscript revision. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
None.
Informed consent
No informed consent is required.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Table S1
. Marker gene list of G1, S, and G2M for cell cycle analysis. (XLSX 11 kb)
Table S2
. 1027 DEGs selected by limma analysis between two subgroups from 414 DLBCL samples. (XLSX 117 kb)
Table S3
. Immune infiltration result for 414 DLBCL samples from MCP-counter. (XLSX 52 kb)
Table S4
. Immune infiltration result for 414 DLBCL samples from EPIC. (XLSX 44 kb)
Table S5
. The proportion of cells in G1, S, and G2M cell cycle phases for 414 DLBCL samples. (XLSX 33 kb)
Table S6
. Screening of characteristic genes between IME and CCE groups in 414 DLBCL. (XLSX 16 kb)
Fig. S1
. Consensus clustering for 181 CHOP and 233 RCHOP samples from GSE10846. (a) Consensus clustering matrix of 181 CHOP samples from GSE10846 for k = 2 (cluster 1: n = 71, cluster 2: n = 110). (b) Consensus clustering matrix of 181 CHOP samples from GSE10846 for k = 3 (cluster 1: n = 70, cluster 2: n = 71, cluster 3: n = 40). (c) Consensus clustering CDF of 181 CHOP samples for k = 2 to k = 8. (d) Consensus clustering matrix of 233 RCHOP samples from GSE10846 for k = 2 (cluster 1: n = 105, cluster 2: n = 128). (e) Consensus clustering matrix of 233 RCHOP samples from GSE10846 for k = 3 (cluster 1: n = 79, cluster 2: n = 73, cluster 3: n = 81). (f) Consensus clustering CDF of 233 RCHOP samples for k = 2 to k = 8. (PNG 847 kb)
Fig. S2
. Kaplan–Meier survival analysis of CHOP and RCHOP patients in GSE10846. (a) Kaplan–Meier survival analysis of 181 CHOP samples in cluster 1 and cluster 2. (b) Kaplan–Meier survival analysis of 233 RCHOP samples in cluster 1 and cluster 2. (PNG 408 kb)
Fig. S3
. Screening of key genes using Lasso regression, Random Forest, and Point-biserial correlation. (a) Selection of 45 DEGs in IME using the Lasso regression model via minimum criteria. (b) Selection of 15 DEGs in CCE using the Lasso regression model via minimum criteria. (c) The feature importance of the top 30 key genes from 217 DEGs in IME is indicated by the MeanDecreaseAccuracy value from the Random Forest model. (d) The feature importance of 25 DEGs in CCE was indicated by the MeanDecreaseAccuracy value from the Random Forest model. (e) 41 genes with point-biserial correlation coefficients greater than 0.5 in IME. (f) 6 genes with point-biserial correlation coefficients greater than 0.5 in CCE. (PNG 2450 kb)
Fig. S4
. Differential analysis between IME and CCE in three external validation sets including GSE34171 (a), GSE87371 (b), and GSE31312 (c). (PNG 1095 kb)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhuang, X., Liu, B., Long, J. et al. Machine-learning-based classification of diffuse large B-cell lymphoma patients by a 7-mRNA signature enriched with immune infiltration and cell cycle. Clin Transl Oncol 26, 936–950 (2024). https://doi.org/10.1007/s12094-023-03326-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12094-023-03326-y