Introduction

Acute myeloid leukemia (AML) represents a cohort of clonal hematopoietic malignances that originate from myeloid precursors and is a highly heterogeneous disease in terms of molecular, cytogenetic and clinical features [1]. Genetic and molecular abnormalities are closely associated with the leukemogenesis and prognosis of AML [2]. Although mutations in several genes, such as FLT3, CEBPA, NPM1, TP53, RUNX1 and ASXL1, have been well established to occur in AML, the current understanding of the molecular mechanisms involved in the development and progression of AML is still limited [3]. Precise risk stratification and prognosis assessment are of great significance in the selection of treatment for AML patients [4]. Therefore, the identification of a series of molecular alterations that can predict the clinical outcomes of AML patients may contribute to the development of AML–specific targeted therapies.

Evidence has proven that cancer–testis (CT) antigens may function in stemness due to their expression during germ cell and embryonic development, which promotes an important oncogenicity effect in cancer cells [5]. To date, a cluster of proteins named sperm-associated antigens (SPAG), of which 15 members (SPAG1, SPAG2/UAP1, SPAG3/SPAG8, SPAG4, SPAG5, SPAG6, SPAG7, SPAG9, SPAG10/MFGE8, SPAG11B, SPAG12/NHP2L1, SPAG13/SSFA2, SPAG15/SPAM1, SPAG16 and SPAG17) are CT antigens, has been identified [6]. Over the years, ample research has reported the vital role of SPAG proteins, which may function as promising new biomarkers for diagnosis and prognosis in solid tumorigenesis, yet there is a great lack of systematic investigation of SPAG family member expression and clinical evaluation of these proteins in acute myeloid leukemia (AML) [6].

To date, our research is the first to report that SPAG1 mRNA expression, among SPAG family members, is negatively associated with survival in AML. Moreover, the prognostic value of SPAG1 overexpression in AML was further confirmed by our data. High expression of SPAG1 mRNA was intrinsically connected to specific genetic (both cellular and molecular levels) abnormalities in AML. Despite these associations, SPAG1 overexpression could also function independently as a prognostic biomarker in AML, and it may serve as a reference for consolidation therapy selection between chemotherapy and hematopoietic stem cell transplantation (HSCT).

Materials and methods

Public datasets

The identification cohort comprised 173 AML patients with RNA-Seq V2 data for SPAG family members (SPAG1, SPAG2/UAP1, SPAG3/SPAG8, SPAG4, SPAG5, SPAG6, SPAG7, SPAG9, SPAG10/MFGE8, SPAG11B, SPAG12/NHP2L1, SPAG13/SSFA2, SPAG15/SPAM1, SPAG16 and SPAG17) from The Cancer Genome Atlas (TCGA) [7]. The treatment regimens for these patients included induction therapy and consolidation therapy. All patients received standard chemotherapy as induction therapy. Following induction chemotherapy, a total of 100 patients underwent chemotherapy only, whereas 73 patients received HSCT as consolidation treatment. In addition, the expression of SPAG1 in AML compared with controls was analyzed in GEPIA (http://gepia.cancer-pku.cn/).

Three independent cohorts from the Gene Expression Omnibus (GEO) database (GSE12417, GSE6891 and GSE37642) were used to validate the prognostic value of SPAG1 expression in AML. Moreover, the association of the SPAG1 expression level with the prognosis of SPAG1 expression level on prognosis of 78 and 162 cytogenetically normal AML (CN-AML) patients was analyzed in the GSE12417 dataset with the public platform GenomicScape (http://genomicscape.com/microarray/survival.php) [8, 9]. The GSE6891 dataset consisted of 461 AML patients, whereas the GSE37642 dataset comprised 562 AML patients. Kaplan–Meier analysis was performed to explore the prognostic value of SPAG1 expression in two groups with median level of SPAG1 expression as the cutoff.

Patients

The validation cohort included 131 AML patients, with 86 enrolled at diagnosis and 45 at complete remission (CR), treated at our hospital. Patients with antecedent hematological diseases or therapy-related AML were eliminated. The clinical characteristics of the cases are presented in Additional file 1: Table S1. Fifteen healthy bone marrow donors served as the controls. The age of the newly diagnosed AML patients (median 52, range 18–81) was similar to that of the AML patients at CR (controls) (median 45, range 28–66). The diagnosis and classification of AML patients followed the 2016 revised World Health Organization (WHO) and French–American–British (FAB) criteria [3, 10]. The treatment regimens of these AML cases were as reported [11,12,13]. The study protocol was approved by the Institutional Ethics Committee of the Affiliated People’s Hospital of Jiangsu University, and all the volunteers provided written informed consent.

Sample preparation, RNA isolation and reverse transcription

Clinical bone marrow (BM) specimens were sampled from the validation cohort of AML cases and controls who were treated in our hospital. We separated BM mononuclear cells (BMMNCs) and then extracted total RNA by using Lymphocyte Separation Medium (Solarbio, Beijing, China) and TRIzol reagent (Invitrogen, Carlsbad, CA), respectively. cDNA was synthesized via RNA reverse transcription as described previously [11,12,13].

Real-time quantitative PCR (RT–qPCR)

Quantized data of SPAG1 and ABL1 (housekeeping gene) transcripts were unfolded by RT–qPCR via AceQ qPCR SYBR Green Master Mix (Vazyme Biotech Co., Piscataway, NJ). The primer sequences were 5′-TCTTCTGCGTCGTGCTAC-3′ (forward) and 5′-TTATCTCCACCGCCATCT-3′ (reverse) for SPAG1 as well as 5′-TCCTCCAGCTGTTATCTGGAAGA-3′ (forward) and 5′-TCCAACGAGCGGCTTCAC-3′ (reverse) for ABL1. The relative SPAG1 transcript level was calculated based on the 2−∆∆Ct method [11,12,13].

Bioinformatics analysis

All procedures referring to bioinformatics analysis were conducted as our previous reports [14, 15]. To obtain the differentially expressed genes/miRNAs (DEGs), RNA-sequencing (mRNA and microRNA) data analysis was performed according to the raw read counts with the R/Bioconductor package “edgeR” based on the filter condition: |log2 fold change (FC)|> 1.5, false discovery rate (FDR) < 0.05 and P < 0.05. All analyses were controlled for FDR by the Benjamini–Hochberg procedure. Gene Set Enrichment Analysis (GSEA) software was used for analysis, and the enrichment pathway was set to be significant based on the nominal (NOM) P < 0.05 and FDR Q < 0.05.

Statistical analysis

Comparisons of continuous and categorical variables were performed using the Mann–Whitney’s U/Kruskal–Wallis test followed by Dunn’s post–hoc test and Pearson’s χ2/Fisher’s exact test, respectively. Both the Kaplan–Meier method (log-rank test) and Cox regression were used to analyze the intrinsic connection between SPAG1 expression and survival time, including leukemia-free survival (LFS), event-free survival (EFS) and overall survival (OS). The receiver operating characteristic (ROC) curve and area under the ROC curve (AUC) were used to determine the discriminating ability of SPAG1 expression for AML and controls. Two-sided P values < 0.05 in all statistical analyses were considered statistically significant.

Results

Identification of SPAG1 among SPAG family members linked to AML prognosis in public datasets

To explore the prognostic significance of the SPAG family members (SPAG1/2/3/4/5/6/7/9/10/13/16/17) in AML, we first determined the impact of each SPAG member on survival time (both OS and LFS) by Cox regression univariate analysis among AML in TCGA datasets. When analyzing the prognostic value of the SPAG family members in AML patients, each one was evaluated according to the difference between two groups of patients, divided by the median level of SPAG expression as the cutoff. As presented in Table 1 and Additional file 1: Table S2, only SPAG1 expression had a significant connection with OS and LFS in both AML (both P < 0.001) and non-M3 AML (both P < 0.001) as well as CN-AML (P = 0.005 and 0.006, respectively). Furthermore, Kaplan–Meier analysis also revealed that AML patients with higher SPAG1 expression showed significantly shorter OS and LFS than those with lower SPAG1 expression among AML (both P < 0.001), non-M3 AML (both P < 0.001), and CN-AML (both P = 0.004) patients (Fig. 1a and b). In addition, the expression of SPAG1 was upregulated in AML patients, as analyzed by GEPIA (Fig. 1c).

Table 1 Cox regression univariate analysis of variables for overall survival in AML patients
Fig. 1
figure 1

The impact of SPAG1 expression on survival of AML patients. a The effect of SPAG1 expression on overall survival in whole-cohort AML, non-M3 AML, and CN-AML from TCGA dataset. b The effect of SPAG1 expression on disease/leukemia free survival in in whole-cohort AML, non-M3 AML, and CN-AML from TCGA dataset. c SPAG1 expression in AML from TCGA dataset. d The effect of SPAG1 expression on overall survival in 78 and 162 CN-AML from the GEO dataset (GSE12417) analyzed by the online web tool Genomicscape (http://genomicscape.com/microarray/survival.php). e The effect of SPAG1 expression on event-free survival and overall survival in AML from GEO datasets (GSE6891 and GSE37642)

Next, the prognostic value of SPAG1 expression in AML was further validated in GEO datasets including GSE12417, GSE6891 and GSE37642. For GSE12417, the online platform GenomicScape (http://genomicscape.com/microarray/survival.php) also confirmed the prognostic correlation of SPAG1 expression with OS in patients with CN-AML among two independent cohorts (P = 0.0035 and 0.05, respectively, Fig. 1d). For GSE6891 and GSE37642, Kaplan–Meier analysis showed that AML patients with higher SPAG1 expression had strikingly shorter EFS and/or OS times than those with lower SPAG1 expression (P = 0.025, 0.0025 and 0.045, respectively, Fig. 1e).

Clinical implications of SPAG1 expression in AML in the TCGA dataset

SPAG1 was the only remaining SPAG member linked to AML prognosis, which prompted us to analyze the associations of SPAG1 expression with the clinical/biological characteristics of AML patients. The differences between the high and low SPAG1 groups in terms of sex, age, white blood cell (WBC) counts, peripheral blood (PB)/BM blasts, FAB classifications, cytogenetics, and gene mutations are shown in Table 2. Notably, cases with higher SPAG1 expression had markedly higher WBC counts than did those with lower SPAG1 expression (P = 0.014, Table 2). Furthermore, there were marked differences between the two groups regarding the occurrence rate of each FAB classification and cytogenetics (P = 0.024, Table 2). Cases with higher SPAG1 expression were commonly classified as FAB-M4/M5 (P = 0.058 and 0.050, respectively, Table 2). Regarding cytogenetics, patients with higher SPAG1 expression more commonly exhibited + 8 (P = 0.034) and rarely t(8;21) (P = 0.014, Table 2). We further showed SPAG1 expression among groups with + 8, t(8;21) or neither (Fig. 2a). In addition, we revealed the associations of SPAG1 expression with several of the most frequent gene mutations in AML (Table 2). Higher SPAG1 expression was markedly or nearly correlated with FLT3, DNMT3A, and WT1 mutations (P < 0.001, = 0.001 and = 0.057, respectively, Table 2). Moreover, we further compared SPAG1 expression between patients carrying or not carrying these gene mutations and observed statistical significance in subgroups divided by FLT3 and DNMT3A status (P < 0.001 and = 0.015, respectively, Fig. 2b and 2c), whereas a trend was observed in subgroups divided by WT1 status (P = 0.051, Fig. 2d).

Table 2 Comparative analysis of SPAG1 expression with clinic-pathologic characteristics in AML
Fig. 2
figure 2

The associations of SPAG1 expression with genetic abnormalities in AML. a SPAG1 expression in AML patients with and without chromosome 8 abnormalities from TCGA datasets. b SPAG1 expression in AML patients with and without FLT3 mutations from TCGA datasets. c SPAG1 expression in AML patients with and without DNMT3A mutations from TCGA datasets. d SPAG1 expression in AML patients with and without WT1 mutations from TCGA datasets

Further confirmation of the prognostic value of SPAG1 expression in AML in the TCGA dataset

Since a significant relationship was observed between SPAG1 expression and some common prognostic factors such as WBC, cytogenetics and gene mutations, we performed multivariate analysis by Cox regression to confirm the effect of SPAG1 expression on survival rate and demonstrated that SPAG1 expression acted as a positive independent risk factor affecting OS and LFS in whole-cohort AML (both P < 0.001), non-M3 AML (P = 0.003 and 0.005, respectively), or CN-AML patients (P = 0.001 and 0.007, respectively) (Table 3 and Additional file 1: Table S3).

Table 3 Cox regression multivariate analysis of variables for overall survival in AML patients

Mutations in FLT3, DNMT3A and WT1 are widely accepted factors that influence AML prognosis [2, 3]. According to this study, since SPAG1 expression was significantly or nearly significantly correlated with FLT3, DNMT3A and WT1 mutations, we further investigated the prognostic value of SPAG1 expression in AML independent of these gene mutations. As Fig. 3 shows, both AML and CN-AML patients with higher SPAG1 expression also exhibited markedly shorter OS and LFS times than those with poor SPAG1 expression, regardless of the mutation status of FLT3 (Fig. 3a), DNMT3A (Fig. 3b), WT1 (Fig. 3c) or all the three genes (Fig. 3d).

Fig. 3
figure 3

The impact of SPAG1 expression on survival of AML patients with specific subtypes. a Kaplan–Meier survival curves of overall survival and disease/leukemia free survival in whole-cohort AML and CN-AML without FLT3 mutation from TCGA datasets. b Kaplan–Meier survival curves of overall survival and disease/leukemia free survival in whole-cohort AML and CN-AML without DNMT3A mutation from TCGA datasets. c Kaplan–Meier survival curves of overall survival and disease/leukemia free survival in whole-cohort AML and CN-AML without WT1 mutation from TCGA datasets. d Kaplan–Meier survival curves of overall survival and disease/leukemia free survival in whole-cohort AML and CN-AML without FLT3/DNMT3A/WT1 mutation from TCGA datasets

SPAG1 expression is a prognostic indicator for AML after HSCT in the TCGA dataset and may have a guiding effect on treatment choice between chemotherapy and HSCT

HSCT is an important consolidation treatment regimen in against disease recurrence in AML. To explore whether HSCT could nullify the negative prognostic effect of higher SPAG1 expression in AML, we analyzed the effect of HSCT intervention on prognosis in both the lower and higher SPAG1 expression groups. In the SPAG1 overexpression group, HSCT for AML patients undergoing induction therapy markedly improved OS and LFS, which was not observed for those just receiving chemotherapy (both P < 0.001, Fig. 4). However, there were no obvious differences regarding OS and LFS between the HSCT and chemotherapy sets in the group of AML patients with lower SPAG1 expression (P = 0.131 and 157, respectively, Fig. 4). To sum up the results, AML patients with SPAG1 hyperexpression may profit from HSCT, which suggests that SPAG1 expression may be used to guide therapeutic selection between HSCT and chemotherapy in AML patients undergoing induction therapy.

Fig. 4
figure 4

The effect of HSCT on survival of AML patients among different SPAG1 expression groups. Kaplan–Meier survival curves of overall survival and disease/leukemia free survival among whole-cohort AML in both lower and higher SPAG1 expression group from TCGA datasets

Molecular signatures associated with SPAG1 expression in AML in the TCGA dataset

To explore the biological network in AML caused by abnormal SPAG1 expression, we first compared the transcriptomes of AML samples with lower and higher SPAG1 expression in the TCGA set. Up to 429 mRNAs and 13 miRNAs were found to be differentially expressed between two sets based on the following conditions: |log2 FC|> 1.5, FDR < 0.05 and P < 0.05 (Fig. 5a–c and Additional file 2: Table S4). Among these DEGs, 206 mRNAs and 7 miRNAs were found to be positively correlated with SPAG1 expression, whereas 223 genes and 6 miRNAs were negatively correlated with SPAG1 expression. Positively correlated genes such as MECOM were reported to have pro-leukemia effects [16] and were associated with prognosis in AML. Negatively correlated genes such as RUNX1T1 and LEP were reported to have anti-leukemia effects and were also informative for AML prognosis [13]. Moreover, GSEA revealed that SPAG1 might participate in HOXA9 dysregulation associated with AML (Fig. 5d).

Fig. 5
figure 5

Molecular signatures associated with SPAG1 in AML. a Expression heatmap of differentially expressed genes between AML patients with lower and higher SPAG1 expression groups among TCGA datasets (FDR < 0.05, P < 0.05 and |log2 FC|> 1.5). b Volcano plot of differentially expressed genes between AML patients with lower and higher SPAG1 expression (FDR < 0.05, P < 0.05, and |log2 FC|> 1.5). c Expression heatmap of differentially expressed microRNAs between AML patients with lower and higher SPAG1 expression (FDR < 0.05, P < 0.05, and |log2 FC|> 1.5). d GSEA analysis of SPAG1 expression associated with HOXA9 dysregulation oin AML (NOM P < 0.05 and FDR Q < 0.05)

Validation of SPAG1 expression and its clinical significance in AML in our research cohort

To verify the expression pattern and clinical significance of SPAG1 expression in AML, we further investigated SPAG1 mRNA expression in BMMNC samples from 86 AML patients at diagnosis, 45 AML patients in the CR period and 15 healthy donors collected in our hospital. As expected, SPAG1 expression was significantly increased in newly diagnosed AML patients compared with healthy controls and AML patients in CR (both P < 0.001, Fig. 6a). Moreover, ROC analysis revealed that SPAG1 expression may serve as a quantifiable biomarker for distinguishing AML from controls, presenting an AUC of 0.857 (95% CI: 0.783–0.93) (P < 0.001, Fig. 6b). Significantly, AML patients who did not achieve CR after 1–2 courses of induction therapy exhibited markedly higher SPAG1 expression levels at diagnosis than those who achieved CR after 1–2 courses of induction therapy (P = 0.020, Fig. 6c). According to the set point of 1.0198 determined by ROC analysis (sensitivity of 66.3% and specificity of 100%), we grouped AML patients into two sets to analyze the prognostic significance of SPAG1 expression. Kaplan–Meier analysis demonstrated a marked tendency of shorter OS time in AML patients with high SPAG1 expression than in those with low SPAG1 expression (P = 0.034, Fig. 6d).

Fig. 6
figure 6

Validation of SPAG1 expression and its clinical significance in AML. a SPAG1 expression in 15 controls, 86 AML patients at diagnosis time, and 45 AML patients who achieved complete remission. b ROC curve analysis of SPAG1 expression in distinguishing AML from controls. c SPAG1 expression at diagnosis time in AML patients who did and did not achieve CR after 1–2 course induction therapy. d Kaplan–Meier survival curves of overall survival regarding SPAG1 expression in whole-cohort AML from our hospital

Discussion

Recent evidence has characterized SPAG family member expression together with its functional roles in cancer development. For example, SPAG1 expression could be enrolled in the early spread and adverse outcome of pancreatic adenocarcinoma and prostate cancer [17, 18]. SPAG2/UAP1 has been shown to be a promising therapeutic target for bladder cancer as well as lung adenocarcinoma [19, 20]. SPAG4 could act as a potential biomarker of glioblastoma progression and prognosis, as well as in renal cell carcinoma and lung carcinoma [21,22,23]. Moreover, SPAG5 hyperexpression was connected to poor disease-free survival in breast cancer patients, and fueled breast cancer cell proliferation [24]. Interestingly, reduced expression of SPAG6, which is transcriptionally regulated by tumor specific DNA methylation, has been revealed in non-small-cell lung cancer [25]. A direct role for aberrant SPAG9 was identified in diverse human cancers such as Kaposi’s sarcoma, gastric cancer, prostate cancer, thyroid carcinoma, liver cancer, and bladder transitional cell carcinoma [26,27,28,29,30,31,32]. Notably, accumulating studies have shown that SPAG6 expression is correlated with the pathogenesis of myelodysplastic syndrome (MDS) and Burkett lymphoma (BL) [33,34,35,36,37]. Consequently, SPAG proteins serve as a novel type of CT antigen with contributions to cancer formation and are likely to be novel targets for tumor targeted therapies.

This study was the first to reveal SPAG1 expression as uniquely associated with poor prognosis in AML among all SPAG family members through both analysis of public data and validation in a research cohort. It was demonstrated that SPAG1 expression could be a promising prognostic biomarker and could be used to optimize the choice of therapy between chemotherapy and HSCT in AML. Unlike SPAG6, SPAG1 expression has rarely been studied in hematological malignances. Conversely, several reports have examined at the relationships between SPAG1 and solid tumors. Shamsara et al. demonstrated that the amplification of SPAG1 was associated with decreased survival in patients with prostate cancer [18]. Moreover, SPAG1 is an early expressed gene in pancreatic tumorigenesis and can promote the activity of cancer cells [17]. Lin et al. showed that SPAG1 expression was a crucial variable related to many clinicopathological features and to RFS in breast cancer [38]. Functionally, SPAG1 acts as an inhibitor of breast cancer cell proliferation and colony formation during breast cancer pathogenesis and development [38]. Since there was no deep insight into SPAG1 in AML, further mechanistic studies are essential for investigating the possible role of SPAG1 in leukemogenesis and AML development.

The current study also identified a significant association between SPAG1 expression and genetic (both cytogenetic and molecular) abnormalities in AML. We first found the associations of SPAG1 expression with FAB-M4/M5 disease, suggesting that SPAG1 expression may play a role in monocyte differentiation and monocyte leukemogenesis. In terms of cytogenetics, SPAG1 expression was positively correlated with +8 but negatively associated with t(8;21) (q22;q22). Since the SPAG1 gene is located in 8q22.2, it is not surprising that aberrant SPAG1 expression was associated with these chromosome abnormalities. Notably, further studies are needed to determine whether the functions of these chromosomal abnormalities during leukemogenesis occur through aberrant SPAG1 expression. Regarding gene mutations, SPAG1 expression was associated with FLT3 and DNMT3A mutations in AML, but the exact relationship between SPAG1 expression and these gene mutations still unclear. Importantly, there is no evidence showing the association of FLT3 and DNMT3A mutations with the above chromosome abnormalities. Consequently, we need to obtain deeper insight into the underlying mechanism of SPAG1 expression in leukemogenesis caused by FLT3 and/or DNMT3A mutations.

In general, our discoveries suggested that SPAG1 hyperexpression may function independently as a prognostic biomarker and assist treatment selection between HSCT and chemotherapy in AML.