Introduction

Spinal cord injury (SCI) refers to the damage of the spinal cord due to various pathogenic factors and is a common clinical disease in orthopedics [1]. SCI is a multi-stage, multi-pathway, and multi-factorial pathological process that often leads to severe neurological dysfunction [2]. This includes primary injuries in the early stages of spinal cord injury and consequent secondary ischemia, edema, and secondary damage caused by reperfusion after ischemia. Due to the non-renewable characteristics of nerves, especially the central nervous system, and high disability rate, it has been plaguing the majority of patients, bringing psychological and economic burden [3]. Studies have shown that primary damage caused by the early stage of spinal cord injury leads to the loss of a large number of nerve cells, and the subsequent secondary damage causes neuronal apoptosis, loss of nerve cells, and non-renewable, and keratinocyte regeneration. The formation of nerve scars also hinders the growth of nerve fibers, and exogenous factors hinder regeneration after spinal cord injury [4]. However, in recent years, a large number of studies have shown that endogenous factors may have a greater effect on nerve repair after spinal cord injury [5].

SCI often causes neuropathic pain (NP). Patients with NP are not only unbearable, but long-term pain affects their sleep, work, and life [6]. Research on this field is currently based on animal models, but research reports on human whole blood are rare.

Previous studies have shown that changes in different levels of gene expression will eventually lead to dysregulation of gene expression. Among these regulatory factors, non-coding RNA is getting more and more attention from scholars. Non-coding RNAs are a class of genomic transcription products that have no protein-coding function. More and more studies have confirmed that non-coding RNAs, although not translated into proteins, are involved in all levels of gene expression [7].

Recent studies have shown that long noncoding RNAs are a class of ncRNAs that are greater than 200 nt in length and lack the ability to encode proteins. Compared to RNA encoding proteins, lncRNAs are shorter in length, fewer exons, less coding, and are tissue or cell specific [8].

Spinal cord injury is a complex pathological process involving a large number of cellular and molecular changes. In recent years, many studies have found that lncRNA is enriched in the central nervous system of mice and affects the development of nerves, and is closely related to many nervous system diseases. LncRNA can regulate the expression of coding genes at different levels, and the mechanism is complex [9]. Therefore, studying the differential expression of lncRNA in spinal cord injury will help us further clarify the pathological process of spinal cord injury. At present, more and more studies on the function of lncRNA have found that its function is complex, and it can participate in the regulation of various stages of gene expression, and exert its effects by affecting the molecular level. Numerous evidences suggest that differential expression of lncRNA is closely related to many human diseases [10]. However, there has not been a study on lncRNA expression spectrum in the pathological process of spinal cord injury. By analyzing the signaling pathways of differential genes, it is helpful to study genes and expressions as a whole network. The greater the enrichment value, the closer the differential gene is involved in the signal pathway [11].

Based on the molecular expression level, this paper compares lncRNA and mRNA expressions respectively in peripheral blood samples of the patients after SCI with NP and without NP, and screens disease-related biomarkers related to NP after SCI in peripheral blood samples of patients.

Materials and methods

Grouping and preprocessing of experimental data

We downloaded the data numbered E-GEOD-69901 from the European Bioinformatics Institute (EBI) ArrayExpress database (https://www.ebi.ac.uk/arrayexpress/) [12]. The detection platform is GPL15207 (PrimeView) Affymetrix Human Gene Expression Array. The data included 25 human peripheral blood samples, 12 of which were samples of refractory NP after SCI, and 13 were control samples without NP after SCI (Supplementary Table 1). The original format provided by the database we downloaded is the expression spectrum of CEL. The data in the original CEL format is converted using oligo Version 1.41.1 [13] (http://www.bioconductor.org/packages/release/bioc/html/oligo.html) in the R3.4.1 language. The median method was used to complement the missing values, the background correction (MAS method), and the quantile method were used to normalize the data.

Screening of significant differentially expressed RNA

Download the detailed comment information of the detection platform GPL15207 corresponding to the E-GEOD-69901 data set (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL15207). Based on the information provided by the platform, such as Transcript ID, RefSeq ID, and location, the mRNA and lncRNA in the expression spectrum were re-annotated using 4328 lncRNAs and 19,029 protein-coding genes contained in the HUGO Gene Nomenclature Committee (HGNC) [14] (http://www.genenames.org/) database.

According to the sample information, the sample was divided into 12 SCI-pain and 13 SCI-no_pain CTRL control sample groups. Then, using the Limma Version 3.34.0 [15] (https://bioconductor.org/packages/release/bioc/html/limma.html) in the R3.4.1 language, the differential expression FDR values and the expression fold change values of the genes between the comparison groups were calculated. FDR values < 0.05 and |logFC| > 0.5 were selected as thresholds for screening for significant differentially expressed RNA. Based on the expression level of RNAs obtained by screening, using the pheatmap Version 1.0.8 in R3.4.1 language [16] (https://cran.r-project.org/package=pheatmap), the expression values were hierarchically clustered [17, 18] based on Euclidean bidirectional and displayed by heat map. Subsequently, the mRNA in the significant differentially expressed RNA obtained by screening was subjected to a DAVID 6.8-based [19, 20] (https://david.ncifcrf.gov/) GO node analysis and a significant enrichment analysis of the KEGG signaling pathway. Screening for GO annotations that are significantly associated with significant differentially expressed mRNAs, as well as significant involvement of the KEGG pathway.

Screening for significantly related modules and RNAs for disease characterization using the WGCNA algorithm

Weighed gene co-expression network analysis (WGCNA) is a typical system biology algorithm for constructing co-expression networks. The algorithm is based on high throughput expression data. Firstly, it is assumed that the constructed network obeys the scale-free network, and defines the co-expressing correlation matrix, the adjacency function formed by the network, and then calculates the different coefficients of different nodes to identify the set modules associated with the disease [21, 22]. Here, we use the WGCNA package in R3.4.1 [23] (https://cran.r-project.org/web/packages/WGCNA/) to analyze all RNAs detected in the data set and to screen the modules and the RNA contained therein that are significantly associated with the disease state. The setup RNA module contains a minimum of 100 RNA elements, cutHeight = 0.99.

The significant differentially expressed RNA screened in 2.2 was then mapped into each of the WGCNA color modules obtained in 2.3. By means of f(k,N,M,n) = C(k,M) × C(nk,NM)/C(n,N) () hypergeometric algorithm [24], the significant enrichment parameter fold enrichment and enrichment significance p value of differential RNA in the module are calculated (where N represents all RNA involved in the analysis of the WGCNA algorithm, M represents the number of RNAs in each module obtained by the WGCNA algorithm, n represents the number of significant differences in the number of RNAs screened in 2.2, and k represents the number of significant differentially methylated genes mapped into the corresponding modules). The module screening threshold was chosen to be p < 0.05, fold enrichment > 1.

Construction of co-expression network

The lncRNA and mRNA contained in the target module of the significantly enriched distribution screened in 2.3 were calculated by the cor function (http://77.66.12.57/R-help/cor.test.html) in the R3.4.1 language to calculate their expression level Pearson correlation coefficient (PCC). A co-expression network between the significant differentially expressed RNAs was constructed by expression association. The network is visualized by Cytoscape 3.6.1 [25] (http://www.cytoscape.org/). Thereafter, using DAVID, a significant correlation analysis of the KEGG pathway was performed on genes in the co-expression network.

Construction of the pathway network related to SCI pain

In the Comparative Toxicogenomics Database 2019 update database (http://ctd.mdibl.org/) [26], the “neuropathic pain” was used as a keyword to search for the KEGG pathway directly related to NP after SCI, and the corresponding pathways in the co-expression network were significantly compared, and the overlapping pathways were obtained. The genes involved in the disease pathway were separately extracted from the co-expression network, a network of SCI pain-related pathway was constructed, and important disease-related biomarkers were screened.

Quantitative real-time PCR clinical trial verification

A total of 6 whole blood samples including 3 healthy samples and 3 samples of refractory NP patients after SCI were collected from China-Japan Union Hospital of Jilin University (Changchun, Jilin, China). Total RNA was isolated from whole blood samples using RNAiso Plus (Trizol) (TAKARA, 9109) reagent according to the manufacturer’s protocols. Follow the kit instructions for mRNA reverse transcription and fluorescence quantitative PCR amplification (PrimeScript™RT Master Mix (Perfect Real Time), TAKARA, RR036A; PrimeScript™ II 1st Strand cDNA Synthesis Kit, TAKARA, 6210A; Power SYBR Green PCR Master Mix, Thermo, 4367659). The primers were listed in Table 1. For statistical analysis, the graphing software was Graphpad prism 5 (Graphpad Software, San Diego, CA), p < 0.05 and p < 0.01 were the screening criteria for significant and extremely significant differences.

Table 1 List of the primers

Results

Preprocessing of data and screening of significant differential expression

First of all, the downloaded expression spectrum data set is normalized. The normalized expression values are shown in Supplementary Table 2. The box diagrams before and after normalization are shown in Supplementary Figure 1. After re-annotation by detecting platform information, a total of 289 lncRNAs and 18197 mRNAs were obtained (the annotation information is also shown in the “type” column in Supplementary Table 2). The distribution density curves of lncRNAs and mRNAs after normalization are shown in Supplementary Figure 2. According to the sample disease information, the sample was divided into 12 SCI-pain and 13 SCI-no-pain CTRL control sample groups, and then 338 significant differentially expressed RNAs satisfying the threshold requirement were obtained by Limma co-screening, of which 187 were significantly downregulated (SCI-pain) and 151 significantly upregulated (SCI-no-pain) expression. The significant differentially expressed RNA-log2FC-log10 (FDR) volcano map is shown in Fig. 1a. Green dots indicate significant differentially expressed RNAs, red horizontal dashed lines indicate FDR < 0.05, and two red vertical lines indicate |Log2FC| > 0.5. A list of significant differentially expressed RNAs is shown in Supplementary Table 3. A bidirectional hierarchical clustering heat map based on the significant differentially expressed of RNA expression levels obtained by screening is shown in Fig. 1b. It can be seen from the figure that the RNA expression values obtained by screening can separate different types of samples and have clear colors, indicating that the RNAs screened in the pain and no-pain control groups are characteristic of the samples. Then, the screened mRNAs in significant differentially expressed RNAs were then subjected to DAVID-based GO biological processes and KEGG signaling pathway enrichment analysis annotations. Fifteen significantly related GO biological processes and 11 KEGG signaling pathways were obtained, as shown in Table 2, and the visualization is shown in Fig. 2. GO biological process nodes and KEGG signal point distribution maps that are significantly associated with mRNA in significant differentially expressed RNAs. The horizontal axis represents the number of genes, and the vertical axis represents the name of the item. The color and size of the dots represent significance. The larger the dots, the closer the color is to red, the higher the significance. The results showed that significant differentially expressed mRNAs were significantly associated with biological processes such as hormone secretion and transport, and were significantly involved in KEGG signaling pathways such as ECM-receptor interaction and TGF-beta signaling.

Fig. 1
figure 1

a Significant differentially expressed RNA-log2FC-log10 (FDR) volcano map. Green dots indicate significant differentially expressed RNAs, red horizontal dashed lines indicate FDR < 0.05, and two red vertical lines indicate |Log2FC| > 0.5. b Bidirectional hierarchical clustering heat map based on expression levels of the significant differentially expressed RNAs

Table 2 Annotations of GO biological process nodes and KEGG signaling pathways that are significantly associated with mRNA in significant differentially expressed RNAs
Fig. 2
figure 2

GO biological process nodes and KEGG signal point distribution maps. The horizontal axis represents the number of genes, and the vertical axis represents the name of the item. The color and size of the dots represent significance. The larger the dots, the closer the color is to red, the higher the significance

WGCNA algorithm screens for modules and RNAs that are significantly associated with disease characterization

All RNAs detected by E-GEOD-69901 were subjected to analysis and screening based on WGCNA algorithm. In order to satisfy the preconditions of scale-free network distribution as much as possible, we need to explore the value of power of the adjacency matrix weight parameter: set the network construction parameter selection range and calculate the scale-free topology matrix. As shown in Fig. 3, we select the value of power when the squared value of the correlation coefficient reaches 0.9 for the first time, that is, power = 30. The average node connectivity of the co-expression network constructed at this time is 1, which is completely consistent with the nature of small-world networks. Then calculate the coefficient of dissimilarity between gene points, and obtain a systematic clustering tree. The minimum number of genes for each RNA module is 100, and the pruning height is cutHeight = 0.99. Through Fig. 3, we get the module division tree diagram of Fig. 4a. Each color represents a different module. A total of 10 modules other than gray were obtained by screening; then the correlation between the modules obtained by each division and the disease characterization was calculated. Figure 4b shows that the heat map of the relationship between different color modules and disease characterization. The color from blue to orange indicates a negative to positive change in the relationship with the disease. The results show that a total of 3 modules exhibit a very significant positive correlation with disease characterization: brown, magenta, and pink.

Fig. 3
figure 3

Left: Adjacency matrix weight parameter power selection graph. The horizontal axis represents the weight parameter power, and the vertical axis represents the square of the log(k) and log(p(k)) correlation coefficients in the corresponding network. The red line indicates the normalized line where the square of the correlation coefficient reaches 0.9. Right: Schematic diagram of average connectivity of RNA under different power parameters. The red line indicates the value of the average connectivity of the network node (1) under the value of the power parameter of the adjacency matrix weight parameter on the left

Fig. 4
figure 4

a Module partitioning tree diagram. Each color represents a different module. b Heat map of the relationship between different color modules and disease characterization. The color from blue to orange indicates a negative to positive change in the relationship with the disease

After the calculation according to the hypergeometric enrichment algorithm described in the method, the significant differentially expressed RNAs screened in the previous step are mapped into each module. Figure 5 shows the significant differentially expressed RNAs in the module distribution ratio pie chart (except the gray module). A total of 333 significant differentially expressed RNAs were distributed in each module (except the gray module), and the number distribution was as shown in Fig. 5. Stabilization modules with significantly enriched distribution of differentially expressed RNAs were selected and the results are shown in Table 3. The results showed that the differentially expressed RNAs were significantly enriched in the brown and magenta modules, and combined with the correlation between the previous modules and disease characterization; we used the differentially expressed RNAs in the two modules of brown and magenta as subjects for further analysis. The brown module contains 44 RNAs (2 lncRNA and 42 mRNA), and the magenta module contains 22 RNAs (22 are mRNA). The distribution of significant differentially expressed RNAs in each module is shown in Supplementary Table 4.

Fig. 5
figure 5

Significant differentially expressed RNAs in the module distribution ratio pie chart (except the gray module).

Table 3 Module information statistics

Co-expression network construction

The expression-related PCC between lncRNA and mRNA in the two target modules screened in 3.2 was calculated. A co-expression network of lncRNA-mRNA was constructed by retaining a ligation pair with a PCC above 0.4 (Supplementary Table 5), as shown in Fig. 6. The network contains 117 edges and 63 nodes, of which 2 lncRNA and 61 mRNA, both of which express significantly upregulated expression of RNAs in pain tissue. An enriched annotation analysis of the KEGG signaling pathway is then performed on the mRNAs that make up the co-expression network. Six KEGG signaling pathways were screened, as shown in Table 4. The results show that genes in the co-expression network are significantly involved in biological processes such as injury response, defense response, immune response, and inflammatory response. At the same time, these genes are significantly involved in the KEGG signaling pathways of complement and coagulation cascades, cell adhesion molecules, and ECM-receptor interactions.

Fig. 6
figure 6

LncRNA-mRNA co-expressing genes. Square and circle represent lncRNA and mRNA. The change in node color from light to dark red indicates the change from low to high of logFC

Table 4 KEGG signaling pathways that are significantly associated with mRNAs in the co-expression network

Construction of SCI pain-related pathway network

In the CTD database, “neuropathic pain” was used as a keyword to search for the KEGG pathway directly related to neuropathic pain. A total of 84 KEGG pathways as listed in Supplementary Table 6 was obtained. After comparing with the KEGG signaling pathway involved in the significant expression of mRNA in the co-expression network in the previous step, 3 overlapping pathways were obtained: hsa04020: calcium signaling pathway, hsa04080: neuroactive ligand-receptor interaction, and hsa05200: pathways in cancer. Based on the genes participating in the overlapping pathway, the part involved in the pathway gene is extracted from the co-expression network. Combining lncRNA-gene-pathway linkages, a pain-related network of collaterals was constructed, as shown in Fig. 7. Therefore, we speculate that the 7 genes directly involved in the pain pathway: E2F1, MAX, MITF, CTNNA1, ADORA2B, GRIK3, and OXTR are closely related to NP after SCI. In addition, LINC01119 and LINC02447 have a positive correlation with these genes at the expression level and are co-located in a module that is significantly positively correlated with disease characterization in the WGCNA algorithm results. Therefore, LINC01119 and LINC02447 may be bio-molecules that are closely related to NP after SCI.

Fig. 7
figure 7

Pain-related network. Squares, circles, and triangles represent lncRNA, mRNA, and pain-related pathways. Changes in node color from light to dark red indicate a low-to-high change in logFC. The solid black line indicates the co-expression relationship between lncRNA and mRNA, and the solid green line indicates that the gene is involved in pathway junction

Real-time PCR experiment results

Real-time PCR results showed that, compared with the normal group, the expression levels of E2F1, MAX, MITF, CTNNA1, and ADORA2B in the disease group were all significantly upregulated (p < 0.01). Compared with the normal group, the expression of OXTR was upregulated. There are many samples in the GRIK3 test that failed to detect the CT value. It may be that the expression abundance is too low, which makes it impossible to detect (Fig. 8).

Fig. 8
figure 8

Fluorescence quantitative real-time PCR detection results. Asterisks indicate a significant difference compared with the normal group (p < 0.01)

Discussion

The spinal cord is a vulnerable part of the central nervous system. At present, it is difficult to have effective treatment methods. The patients with spinal cord injury lose their labor and self-care ability, and cause many serious complications. Pathological neuropathic pain is more common after spinal cord injury, and there is currently no effective and effective treatment [2]. At present, the research on neuropathic pain related to spinal cord injury is mostly concentrated in mice, and research reports on human whole blood are rare [27, 28].

Based on the molecular expression level, this paper compares lncRNA and mRNA expression spectrum in peripheral blood tissue samples of NP patients after SCI, six KEGG signaling pathways were screened in the co-expression network, and three KEGG pathways with direct neuropathic pain were identified. We found that there are seven genes directly involved in the pain pathway.

Tumor-related studies have shown that the upregulation of factor E2F1 can cause pathological pain. The paracrine factors interacting with their receptors could cause the activation of downstream transcription factors such as E2F1 to upregulated expression of genes associated with pain [29]. Studies have shown that factor MITF can promote and survive osteoclast precursors, greatly enhancing the incidence of bone metastasis pain [30]. It was found in key genes related to diabetic nephropathy that CTNNA1 factor can cause other arrhythmogenic right ventricular cardiomyopathy [31]. When studied the updated mechanisms underlying sickle cell disease-associated pain, it was discovered that ADORA2B factor can be used as a target gene to cause pain. Studies have shown that mutations in the oxytocin receptor gene (OXTR) are associated with behavioral and neurological transference accuracy [32].

In summary, our results showed that the expression levels of E2F1, MAX, MITF, CTNNA1, and ADORA2B in the disease group were all significantly upregulated. Compared with the normal group, the expression of OXTR was upregulated. The data from the present study suggested that these genes may play the important role in NP (after SCI) and serve as the potential biomarkers of severe NP (after SCI) clinical diagnosis.

Conclusion

In summary, based on the molecular expression level, this paper compares lncRNA and mRNA expressions respectively in peripheral blood samples of the patients after SCI with NP and without NP, screens GO annotations and significantly involved KEGG pathways that are significantly associated with significantly different mRNAs, and screens them with WGCNA algorithm. Significantly related modules and RNAs were constructed and co-expression networks were constructed to screen for expression-specific disease-related biomarkers related to NP after SCI in peripheral blood samples of patients. Through these disease-related molecules, we can have a better prognosis for HP after SCI. The screening of these molecules will provide an important basis for future clinical targeted research.