Background

Adult T-cell leukemia/lymphoma (ATLL) is virus-caused cancer that is developed after infection by Human T-cell leukemia virus type-1 (HTLV-1) [1]. ATLL is diagnosed by the aggressive T-cell and malignant lymphoproliferations which are increased in the infected individuals after likely a long latency period [2]. The prevalence of ATLL is approximately 5% among HTLV-1 infected cases. Based on Shimoyama classification, ATLL is categorized into four major subtypes: acute, lymphoma, chronic, and smoldering. The first two are aggressive with a poor prognosis and the last two include an indolent clinical period with disparate clinicopathologic characteristics. The acute type is more common and usually is associated with high amounts of serum lactose dehydrogenase and leukemia. The lymphoma cells are present in the blood and affect the bones, skin, lymph nodes, spleen, and liver. In addition, lymphomatous ATLL is infrequent and grows quickly. Also, it can impress the brain and spinal cord with an increase in the lymph nodes. Chronic ATLL develops leisurely similar to the smoldering type and elevates T cells and lymphocytes in the blood. It can influence the lungs, skin, spleen, liver, and lymph nodes. Smoldering ATLL can also affect the lungs and skin which leads to unusual T-cell counts [3,4,5].

MicroRNAs (miRNAs) are a category of non-coding RNAs with a length of almost 19–25 nucleotides that regulate the expression of different genes. They have effects on various biological functions such as proliferation, cell cycle, apoptosis, differentiation, and immune response. The conceivable roles of miRNAs in the progression of ATLL and tumorigenesis have been specified [6,7,8].

Different ATLL subtypes have a poor prognosis because of the intrinsic chemoresistance and the severe immunosuppression in addition to their heterogeneous advent. The combination of chemotherapy drugs and miRNAs can be a suitable remedy for ATLL [9]. Several papers have introduced the genes and miRNAs implicated in the progression of ATLL without considering different subtypes [10,11,12]. Therefore, the exploration of miRNA-gene interactions in various ATLL subtypes to propose potential therapeutic targets using computational algorithms could be advantageous.

Weighted gene co-expression network analysis (WGCNA) is a potent algorithm that could cluster the genes through the calculation of correlations between them. The identified clusters named modules contain the co-expressed gene groups which likely participate in the same biological pathways. Moreover, assessing the preservation of the identified modules in the external data could lead to identifying the specific modules involved in disease [10].

We recently used machine learning to classify different ATLL subtypes based on the mRNA and miRNA datasets [9]. However, we could only find one common miRNA and a few genes for each subtype. In this study, we employed the weighted gene co-expression method for finding specific coding and non-coding RNA interactions for three subtypes of ATLL. It sheds light on the pathogenesis mechanisms from asymptomatic carriers (ACs) toward the progression of each ATLL subtype.

Materials and Methods

Gene expression datasets and preprocessing

The microarray gene expression datasets GSE33615 [13], GSE55851 [14], GSE29312 [15], and GSE29332 [15] were downloaded from the database Gene Expression Omnibus (GEO). The two first datasets include the gene expression levels in the Peripheral Blood Mononuclear Cells (PBMCs) or the whole blood of patients with one of the ATLL subtypes including acute, chronic, and smoldering. The last two datasets contain the gene expression levels in the PBMCs of AC carrier samples. Totally, 29, 23, and 10 subjects including ATLL with acute, chronic, and smoldering subtypes, respectively, as well as 37 AC subjects were used for further analysis. In addition, GSE31629 [13] and GSE46345 [16] datasets containing the miRNAs expression levels of 40 ATLL and 12 ACs subjects were employed to analyze the non-coding RNA data. The dataset details are explained in Table 1. The possible batch effect among datasets was removed using the function of removeBatchEffect in the Limma package version 3.54 in the R 4.2.2 environment [10, 17,18,19,20,21]. The data was also quantile normalized.

Table 1 Details of the datasets involved in the analysis

Weighted gene co-expression network

The weighted gene co-expression network was constructed employing the R package “WGCNA” version 1.71 [22]. WGCNA was used to find clusters of co-expressed genes that likely are involved in similar biological pathways. To identify these clusters, known as modules, an adjacency matrix was initially calculated using Pearson correlation between pairs of genes/miRNAs, with the optimized soft power. The “pickSoftThreshold” function was used to identify scale-free topology fitting indices against different soft thresholding powers β. Afterward, the Topological Overlap Matrix (TOM) was determined by transforming the adjacency matrix. Highly co-expressed genes were then grouped using hierarchical clustering. Next, the dynamic tree cut algorithm was applied to cut dendrogram branches and to identify gene modules. The close modules were merged utilizing the mergeCloseModules function.

Identification of specific modules for each subtype

In this step, the module’s preservation for each individual ATLL subtype in the ACs expression dataset was determined. To this end, the “modulePreservation” function in the WGCNA package (version 1.71) was utilized. The module preservation statistics introduced a measure indicating the preservation or somewhat non-preservation of a module between a reference network and a test network [23]. In this study, the co-expression networks of ATLL subtypes were considered as the reference and ACs as the test network. The same analysis was performed for the miRNA dataset. The parameters of Zsummary (\(\frac{ {Z}_{\text{d}\text{e}\text{n}\text{s}\text{i}\text{t}\text{y}}+ {Z}_{\text{c}\text{o}\text{n}\text{n}\text{e}\text{c}\text{t}\text{i}\text{v}\text{i}\text{t}\text{y}}}{2}\)) and medianRank (\(\frac{ {\text{m}\text{e}\text{d}\text{i}\text{a}\text{n}\text{R}\text{a}\text{n}\text{k}}_{\text{d}\text{e}\text{n}\text{s}\text{i}\text{t}\text{y}}+ {\text{m}\text{e}\text{d}\text{i}\text{a}\text{n}\text{R}\text{a}\text{n}\text{k}}_{\text{c}\text{o}\text{n}\text{n}\text{e}\text{c}\text{t}\text{i}\text{v}\text{i}\text{t}\text{y}}}{2}\)) were measured to determine the preservation of modules. Zsummary and medianRank combine various preservation statistics into individual measures of preservation. These two measures are both important for deciding the preservation of a network module. In this study, Zsummary determines whether modules identified in the ATLL datasets remain highly connected in the ACs dataset (density) and whether the connections between the genes in each module are the same between the ATLL and ACS datasets (connectivity) [24]. The medianRank is beneficial to compare the preservation among several modules so that a module with a higher medianRank shows weaker preservation statistics than a module with a lower median rank. It is highly independent of module size [23]. Modules with Zsummary<2 and medianRank≥8 were regarded as non-preserved gene co-expression modules in the ACs group and so are specific for each ATLL subtype [25,26,27]. Moreover, Zsummary<2 was considered to determine specific miRNA co-expression modules for ATLL.

Deteremining differentially expressed genes and miRNAs

To determine the differentially expressed genes (DEGs) and differentially expressed miRNAs (DEMs) between ATLL and ACs groups, the Bioconductor package Limma (version 3.54) was employed. The statistically meaningful DEGs and DEMs were identified by applying Benjamini-Hochberg adjusted p-value [28] cutoff of less than 0.05.

Identification of target genes for miRNAs

The unique DEGs in the preserved modules in each ATLL subtype were determined (U_DEGs). Moreover, the unparalleled DEMs in the preserved modules in ATLL were also found (U_DEMs). Next, the miRTarBase database containing the experimentally validated miRNA-target gene interactions was searched to determine the target genes of the U_DEMs [8]. Afterward, the common genes between these target genes and U_DEGs were determined (C_DEGs). Finally, the interactions of miRNA-genes was depicted in Cytoscape 3.6.1.

Stepwise method to perform analysis

The steps of the performed analyses in this study are shown in a flowchart (Fig. 1). Briefly, we first prepared data for further analysis by merging different datasets and pre-precessing. Then, we constructed the weighted gene/miRNAs co-expression networks. Afterward, we determined the specific gene modules for each ATLL subtype/miRNA module for ATLL through performing module preservation analysis and finding unique genes in each gene module (U_modules). In the next step, we identified DEGs and DEMs between ATLL and ACs and then found unique DEGs for each subtype. We further identified shared genes between unique DEGs and genes in U_modules (U_genes) as well as common miRNAs between DEMs and miRNAs in U_modules (U_miRNAs). Following the determination of the target genes of U_miRNAs, we explored the shared genes between the target genes of U_miRNAs and U_genes (C_genes). Finally, we constructed miRNA-gene interactions between miRNAs and C-genes for each subtype.

Fig. 1
figure 1

Flowchart of the step-wise analyses in this study

Results

Construction of WGCNs

A total of 14,837 common genes were used to construct three weighted co-expression networks for three ATLL subtypes. At first, the soft-thresholding power (β) of 7, 17, and 2 were determined as the optimum quantities to obtain a scale-free topology for acute, chronic, and smoldering, respectively. After calculating adjacency matrix power β, TOM dissimilarity, hierarchical clustering, cutting the clusters, and finally merging the close clusters, nine modules were identified for ATLL_acute, seven modules for ATLL_chronic, and nine modules for ATLL_smoldering (Grey module contains the genes that are not assigned to any of the modules). Figure 2a-c indicates the dendrogram and the identified modules specified by a unique color for each subtype. Moreover, a weighted gene co-expression network was constructed for miRNA ATLL samples. No dataset comprising the miRNA expression for each ATLL subtype is available, so we presumed the miRNA expression for ATLL regardless of its subtype. The β of 10 was determined as the optimum value to reach a scale-free topology. Figure 3 demonstrates the dendrogram and the four obtained modules.

Fig. 2
figure 2

Dendrogram of clustered genes constructed by WGCNA based on (1-TOM) for (a) ATLL acute subtype (ATLL_acute), (b) ATLL chronic subtype (ATLL_chronic), and (c) ATLL smoldering subtype (ATLL_smoldering) with the specified module colors. Each color denotes a module (group of genes) determined by the dynamic tree cut algorithm before and after merging modules

Fig. 3
figure 3

Dendrogram of clustered genes constructed by WGCNA based on (1-TOM) for miRNA dataset of ATLL with the specified module colors. Each color denotes a module (group of genes) determined by the dynamic tree cut algorithm before and after merging modules

Identification of non-preserved modules

To identify specific modules for each of the three ATLL subtypes, their preservations in the ACs dataset were investigated. The modules with medianRank ≥ 8 and Zsummary < 2 were considered as specific non-preserved gene modules and Zsummary < 2 for miRNA modules. Figure 4a-c demonstrates the plots of Zsummary scores and Fig. 4d-f indicates the plots of medianRank scores versus module size for ATLL_acute, ATLL_chronic, and ATLL_smoldering, respectively (Supplementary data file 1). Therefore, blue4 and coral4 modules in ATLL_acute, darkorange and navajowhite2 modules in ATLL_chronic, and darkseagreen2 module in ATLL_smoldering were found as specific and subtype-related modules. Figure 5a,b also represents the plots of Zsummary and medianRank scores for ATLL_miRNA and shows the preservation of turquoise and yellow modules in ATLL (Supplementary data file 1). Next, we determined the unique genes in each specific module among all ATLL subtypes. Since they are not present in any other modules, we referred to them as unique modules (U_modules, Supplementary data file 2). The miRNAs in the preserved modules in ATLL (turquoise and yellow) were also considered U_modules. In the further step, we determined DEGs between each ATLL subtype and ACs samples as well as DEMs between ATLL and ACs samples considering adj. p. value < 0.05. Then, the unique DEGs for each subtype were identified (Supplementary data file 3). Afterward, the common ones between genes/miRNAs in each U_module and DEGs/DEMs called U_genes/U_miRNAs were found (Supplementary data file 4).

Fig. 4
figure 4

Preservation Zsummary (a-c) and medianRank (d-e) versus module size for ATLL acute subtype (ATLL_acute), ATLL chronic subtype (ATLL_chronic), and ATLL smoldering subtype (ATLL_smoldering), respectively. The modules below the dashed line ( Zsummary<2 and medianRank ≥ 8) are the specific modules for each ATLL subtype

Fig. 5
figure 5

Preservation (a) Zsummary and (b) medianRank versus module size after constructing a weighted miRNA co-expression network. The modules below the dashed line ( Zsummary<2 and medianRank ≥ 8) are the specific modules for ATLL.

Constructing miRNA‑gene interactions

To find the experimentally validated target genes of U_miRNAs, the miRTarBase database was explored (Supplementary data file 5). Next, the shared genes between the target genes and U_genes (C_genes) for each subtype were explored. As a result, the interactions of miR-29b-2-5p and miR-342-3p with LSAMP in ATLL_acute, miR-342-5p with FOXRED2, miR-342-3p with ZNF280B, and miR-575 with UBN2 in ATLL_chronic, miR-1225-3p and miR-940 with CDCP1, miR-423-3p and miR-940 with C6orf141, miR-324-3p with COL14A1 in ATLL_smoldering were found (Fig. 6). The identified miRNA-gene interactions may be involved in the pathogenesis mechanism and development of each subtype. Moreover, the unique ones in these interactions could be considered potential biomarkers.

Fig. 6
figure 6

The unique miRNA-gene interactions for (a) ATLL_acute, (b) ATLL_chronic, (c) ATLL_smoldering.

Discussion

The identification of the potential role of genes and miRNAs in the development of each ATLL subtype is crucial for understanding the pathogenesis mechanism and identifying therapeutic targets. In this study, we utilized the weighted gene co-expression analysis procedure to identify the particular co-expressed genes in three subtypes of ATLL. In the following, we discuss the determined genes and miRNAs that probably have the main roles in the progression of each ATLL subtype cancer.

In the acute subtype, LSAMP gene and its interaction with miR-29b-2-5p and miR-342-3p were identified. LSAMP encodes a neuronal surface glycoprotein present in the subcortical and cortical regions of the limbic system. LSAMP can be involved in tumor suppression and neuropsychiatric disorders [29, 30]. Furthermore, miR-29b-2-5p and miR-342-3p barricade cell proliferation and promote apoptosis. Their functions have been determined in several cancers, such as pancreatic ductal adenocarcinoma, cervical cancer, and non-small cell lung cancer [31,32,33]. The lower expression of LAMP may be related to the higher expressions of miR-29b-2-5p and miR-342-3p that ultimately result in tumor suppression [30].

In the chronic subtype of ATLL, FOXRED2 and ZNF280B were found to have interconnections with miR-342-5p and miR-342-3p, respectively, and UBN2 was also identified to have an interaction with miR-575. FOXRED2 is an unstable protein that is probably implicated in the ubiquitin-dependent ERAD pathway and is essential for the modulation of the proteasome [34]. The inhibitors of proteasome induce apoptosis, which can have an antitumor effect [35]. The function of FOXRED2 in cancer is not yet fully understood, and further studies are required to investigate its role in chronic ATLL. ZNF280B is known as an oncogene that encodes a transcription factor protein inducing the overexpression of MDM2. MDM2 boosts tumor constitution and cancer cell growth by targeting some tumor repressor proteins like p53 [36, 37]. MiR-342-5p is a downstream molecule of Notch signaling implicated in the regulation of Endothelial cells (ECs) during angiogenesis. Its higher expression weakens angiogenesis and promulgated EndMT. MiR-342-5p likely acts as a tumor suppressor and may also suppress migration and cell proliferation [38, 39]. Similarly, miR-342-3p represses cell growth and proliferation and also inhibits migration and invasion [32, 40]. The overexpression of these two miRNAs by targeting ZNF280B and FOXRED2 could suppress tumorigenesis and cell proliferation in chronic ATLL.

On the other hand, UBN2 is a nuclear protein with the capability of interacting with several transcription factors. It acts as an oncogene that can be involved in the proliferation and tumorigenicity of cancer cells [41]. UBN2 can contribute to the transcription of the KRAS gene as a sector of histone chaperone. The cell cycle can be regulated by KRAS signaling through phosphorylation and interdicting p21 and p27 to mitigate cyclinD1 [42]. UBN2 is targeted by miR-575 as an oncomir that can boost cell proliferation and migration in some cancer cells and possibly chronic ATLL [43,44,45].

In the smoldering subtype of ATLL, CDCP1, C6orf141, and COL14A1 were found. CDCP1 is a known protein implicated in malignancies of multiple cancers. It associates with important tumorigenic signaling cascades, comprising the PI3K/AKT, SRC/PKCδ, RAS/ERK, WNT axes, and oxidative pentose phosphate pathway [46]. Therefore, CDCP1 is a considerable therapeutic and diagnostic target [47]. C6orf141 has been found as a tumor repressor protein in oral cancer. Its promoter CpG islands are methylated in some cancer which communicates with high-density lipoprotein alterations [48]. COL14A1 is another gene whose role has not been fully understood in cancers. It is methylated in renal cell carcinoma that may act as a tumor suppressor. It associates with a poorer prognosis independent of tumor grade, size, and stage [46]. Also, it has been identified that COL14A1 has an important role in keeping the stem cell-like and self-renewal features of Liver cancer stem cells through the activation of ERK signaling [47]. MiR-940 interdicts proliferation and migration of cancer cells and miR-1225-3p implicates malignancy. These two miRNAs interact with CDCP1 [48, 49]. Moreover, miR-423-3p is an oncomir that boosts cancer cell proliferation through the promotion of the G1/S transition phase of the cell cycle [50, 51]. It is in association with miR-940 target C6orf141 in smoldering ATLL. On the other hand, miR-324-3p which targets COL14A1, suppresses the invasion and growth of some cancer cells by elevating the apoptosis [52]. Also, it was proposed that the miR-324-3p/Smad4/Wnt signaling axis could be a therapeutic target to barricade cancer progression [53]. However, more studies must be performed for finding its convenient role in tumorigenesis.

On the whole, the miRNA-gene interaction networks that may contribute to the pathogenesis of each ATLL subtype were proposed. However, these networks represent only a small fraction of the complex network involved in ATLL development, and additional data are required to unveil the complete network. Therefore, future studies with larger cohorts are necessary to determine the comprehensive interaction of genes and miRNAs in each ATLL subtype.

Conclusion

In summary, we found the genes and miRNAs that could be significantly involved in the pathogenesis of three ATLL subtypes. The step-wise analysis revealed unique genes/miRNA in the identified interactions, including LSAMP and miR-29b-2-5p in acute, FOXRED2, UBN2, miR-342-5p, and miR-575 in chronic, and CDCP, C6orf141, COL14A1, miR-1225-3p, miR-940, miR-423-3p, miR-324-3p in smoldering subtypes. These genes and miRNAs could serve as potential biomarkers. However, their efficacies should be confirmed through experimental studies.