Background

Inflammatory bowel disease (IBD) is a disorder affecting the intestines with two prominent disease types; ulcerative colitis (UC) and Crohn’s disease (CD), whereby UC is confined mostly to the colonic mucosa with persistent chronic inflammation [1] and CD is a transmural disease affecting the entire gastrointestinal tract [2]. Rising incidence of IBD have been attributed to the gut-microbiome interactions, genetic predispositions, and environmental triggers [3], but more recently attention has been placed on epigenetic mechanisms [4]. Non-coding RNAs (ncRNAs) play an important role in epigenetic regulation by affecting gene expression but can also directly affect protein function, thus having a substantial impact on biological processes [5]. Though there are many types of ncRNAs, the defining characteristic of a ncRNA is their length; ranging from ~ 22 nucleotides (nts) to > 200nts [6], and for micro-RNAs, 20–25 nts [7].

The Montreal classification system defines the CD behaviors as B1, a non-stricturing/non-penetrating disease; B2 as stricturing; and B3 as penetrating disease. Similarly, the site of CD manifestation is also stratified into three main locations, L1 (ileal), L2 (colonic), and L3 (ileocolonic) [8]. Previous CD studies suggest that environmental factors can modulate pre-disposition and affect disease outcomes [9], likely through genetic and/or epigenetic mechanisms. However, little is known about the role epigenetics or ncRNA expression might play in influencing specific disease behavior, location, or inflammatory status during CD.

In fact, very few studies have investigated the role of ncRNA on epigenetic mechanisms to determine whether they are playing a beneficial or detrimental role in IBD development and progression. Notably, earlier we performed the first large scale CD ncRNA analysis in IBD and identified two differentially expressed (DE) up-regulated lncRNAs, LINC01272 and HNF4A-AS1 in CD patients [10]. Likewise, a few other studies have analyzed smaller datasets and observed changes in IBD specific ncRNAs, i.e. LINC01272 [10], DIO3OS [11], and KIF9-AS [11] using RT-PCR in intestinal tissues and plasma samples in IBD. In particular, the expression of these ncRNAs was greater in IBD patients when compared to controls [11]. However, none of these studies explored whether the ncRNAs expression are specific to tissue types, CD disease behaviors, disease location or disease inflammation. Therefore, in this study our goal was to identify the ncRNAs associated with tissue location (ileum vs. rectum), CD disease behaviors (among B1, B2 and B3), and inflammation on disease location (among L1, L2 and L3) during CD by utilizing CD participants from the largest RISK cohort clinical and disease characteristics.

Using the RISK cohort [12], we evaluated high-density ncRNA transcriptomic profiles from 735 samples (345 are from ileal and 390 are from rectal biopsies) with well-defined 5-year patient follow-up and clinical metadata. We assessed whether CD specific ncRNA expressions were consistent across tissue location (ileum vs. rectum) and investigated whether changes in ncRNA levels are representative of CD disease behaviors (B1, B2 and B3). Lastly, we tested whether changes in ncRNA levels have the potential to distinguish inflammatory status from disease location within CD patients. Our results show a dysregulation in ncRNA abundance in different mucosal location during distinct stages of CD progression (thus the disease behavior) that are potentially useful in future clinical companion diagnosis.

Methods

Cohort

All samples used in this study were part of the RISK study (Risk Stratification and Identification of Immunogenetic and Microbial Markers of Rapid Disease Progression in Children with Crohn’s Disease) [12]. Prior to diagnosis and treatment, confirmation of disease status and extent was histologically evaluated by a physician(s). Ileal and rectum bulk-biopsies were obtained from newly diagnosed CD patients by colonoscopy. Patients with no bowel pathology, negative gut inflammation, and asymptomatic for IBD, were considered as non-IBD controls. Ileal biopsies were obtained from 71 non-IBD controls and 274 CD patients at diagnosis. Similarly, rectal biopsies were obtained from 61 controls and 329 CD patients at diagnosis from the same RISK cohort. Individuals were clinically assigned according to the Montreal classification system, initially at baseline, for CD patients. Physician assessed Montreal classifiers were denoted by disease status such as B1 (inflammatory), B2 (stricturing), or B3 (penetrating); and inflammation location such as L1 (ileal), L2 (colonic), and L3 (ileal-colonic) [13]. Disease severity classifications, demographics, and clinical information were collected for each patient at time of enrollment and during follow-up, are provided in Table 1 along with other patient metrics included age, sex, disease type, disease behavior, inflammatory status and inflammation on disease location.

Table 1 Patient clinical characteristics

RNA-sequencing

All biopsies were extracted and processed as previously described [14, 15] with NEBNext Ultra RNA Library Prep Kit used for Illumina RNA sequencing library preparations by following set manufacturer’s recommendations (NEB, Ipswich, MA, USA). Approximately 77% of the 274 CD ileal biopsies, n = 212, were used for our previous publication [10]. These were re-sequenced for a deeper RNAseq, alongside the other ileal samples. Libraries were sequenced on the HiSeq system using Paired End (PE) 150 base pair chemistry by GEMEWIZ, South Plainfield NJ. Whole biopsy RNA sequencing for both ileal and rectal samples was done in a single batch. Read quantification was conducted and aligned to the GENCODE v28 (HG38) reference genome using STAR package [16]. EdgeR was used to analyze Differential Expressed long ncRNAs (DE lncRNAs) [17, 18]. In total, 20,779 ncRNAs were analyzed in both the ileum and rectum datasets. Overall, a list of fifteen types of non-coding RNAs were assessed, where the categorization was based upon their length of nucleotides (Additional file 1: Table S1).

Study design

The workflow and the overall layout of this study are provided in Additional file 2: Fig. S1, where the total number of samples included and number of differentially expressed ncRNA (DEncRNA) are identified in each comparison. Overall, we examined the differential expression of ncRNA across tissue types, CD disease behaviors, inflammation on disease location obtained from a subset of pediatric CD patient’s intestinal biopsies from RISK study (Additional file 3: Table S2).

Statistical analysis

Genome-wide differential expression analysis was conducted using computational algorithms with packages EdgeR [17] and SARtools [19] in R Studio version 1.2 [20]. In this study, the DE both up- and down- regulated ncRNAs were defined with FC > 2 and FDR < 0.05 after multiple test corrections and Principal Component Analysis (PCA) were performed using princomp package [21].

lncRNA-miRNA modeling

Thermodynamic calculations, binding affinities, and base-pair modeling were conducted using IntaRNA version 2.3.0, in conjunction with Vienna RNA package 2.4.9 [2, 22, 23]. Utilizing this server, possible interactions between target mRNAs such as mir-1244-2, mir-1244-3, mir-1244-4 and query RN7SL2 ncRNA were computed with parameters for the sliding window size set with 150, maximum length of unpaired region with 150, maximum distance of two paired bases with 100, the weight for ED values of target RNA and query RNA were set at 1, and the temperature was set at 37 degrees Celsius. The Heuristic for hybridization end used was also incorporated within the molecular base-pairing predictions.

Random forest prediction

The total number of samples was split into a training and a validation set. The training sets contained equal number of samples for each comparison. These were randomly selected based on alignment quality, sorting best to worst. Of the selected samples, the training set consisted of 50% of the data whereas the validation sets contained 50% of the remaining data. Using RandomForest, classifiers were built on the training set using n = 100,000 trees, mtry set to 2, and disease status, behavior, or inflammation were evaluated. To test each classifier model, five-fold cross-validation was implemented. Each time, samples were arbitrarily selected to train and test the model. Cross-validation folds were fixed, across all comparisons. Accuracies were calculated using confusion matrices of test set class labels and test set predictions. These accuracies demonstrate the overall robustness of each model.

Results

ncRNA profiles within mucosal biopsies separate CD from controls

Overall ncRNA transcriptomic profiles from ileal biopsies explained 34% and 12% of the variances with the first two PCs (Fig. 1a), whereas in rectal biopsies it only explained 16% and 7% variances (Fig. 1b), respectively. The first two PCs from the entire ncRNA transcriptomic profile showed that ncRNA levels have the potential by nature to discriminate CD from non-IBD controls (Fig. 1a, b). Further differential expression (DE) analysis identified a total of 89 DE ncRNAs in the ileum when comparing 274 CD cases to 71 controls. Of them, 62 were up-regulated in CD, while 27 were down-regulated (Fig. 1c). A similar comparison in rectal biopsies showed 41 DE ncRNAs when 329 CD cases were compared to 61 controls (Fig. 1d). Of those, 18 were up-regulated and 23 were down-regulated in CD. A hierarchical clustering of DE ncRNAs showcased two independent clusters representing CD and control groups. This pattern was observed for both ileum and rectal biopsies using the corresponding DE ncRNAs observed in each tissue separately (Fig. 1e, f). List of all FDR significant DE ncRNAs at FDR < 0.05 as well as the nominally significant DE ncRNAs (P < 0.05) for both ileal and rectal biopsies are provided in Additional file 4: Table S3. To further test whether the expression of CD specific ncRNAs are consistent across the different location of the intestine, we compared the log2FC of ncRNAs that are observed in both ileal and rectal biopsies. Of the 130 DE ncRNAs tested, 17 were shared in both the ileum and rectum, 87 were differentially expressed only in ileum, and 25 were differentially expressed only in the rectum (Fig. 1g, Table 2, Additional file 2: Fig. S2a). We determined whether the disease-specific expression of the 130 DE ncRNAs are expressed in the same direction or magnitude regardless of tissue type by comparing the log2FC of ileal DE ncRNAs to those in the rectum (Fig. 1g). Surprisingly, 88% (n = 114) of DE ncRNAs were expressed in the same direction regardless of tissue types, with a strong positive correlation of R = 0.69; P < 2.2e−16. The remainder (n = 16) were directionally inconsistent, with a strong negative correlation of R = -0.79; P < 2.2e−16 (Fig. 1h). These 16 ncRNAs showed unique, statistically significant differences when examined in both ileal and rectal samples amongst disease status. Similar to previous analysis [10], most of the DEncRNA found in our analysis were either antisense or lincRNAs.

Fig. 1
figure 1

Crohn’s disease versus controls DE ncRNAs. Using princomp, principal components of entire ncRNAs (n = 20,779) were calculated in ileal (n = 345) a and b rectal (n = 390) biopsies shows separate cluster for CD and non-IBD controls. c Volcano plot shows the DE analysis results from ileal samples and shows a total of 89 DE ncRNAs between CD and non-IBD controls d whereas in rectal biopsies, there were a total of 41 DE ncRNAs were identified e, f Hierarchical clustering in heatmap shows a clear separation of CD and non-IBD controls for 89 DE ncRNAs in ileal biopsies and 41 DE ncRNAs in rectal biopsies. g The log2FC of 130 DE ncRNAs obtained from both ileal and rectal biopsies are plotted. Each dot represents a CD specific DE ncRNA either in ileum or rectum. The ncRNA that are significant only in ileum, only in rectum, and both ileum and rectum are marked in dark red, light red and blue, respectively. h The CD specific DE ncRNAs with inverted effects in both ileum and rectum are plotted

Table 2 CD versus Controls DEncRNAs

Next, to test our findings robustness and reliability, we compared our current results to our previous study [10], which contains a subset of ileal samples from the same RISK cohort. For this comparison, we excluded the matched ileal samples (n = 212) that were shared with our previous study, and in this subset analysis, 188 DE ncRNAs were observed. A direct log2FC comparison of ncRNAs expression in CD patients from both the studies showed a directionally consistent pattern with a strong positive correlation of R2 = 0.86; P < 2.2E−16, validating our methods and replicability (Additional file 2: Fig. S2b).

Taken together, these results show that changes in mucosal ncRNA levels are specific to CD and are most prevalent in the small intestine, but interestingly shows distinct changes in ncRNA signatures in the rectum. In contrast, from the 114-disease specific DE ncRNAs, we also found a small number of were miRNAs (ncRNAs with < 22 length of nucleotides), namely, MIR-1244-1, MIR1244-2, and MIR1244-3, regardless of tissue location.

Pathways annotated to CD specific ncRNAs

The total ncRNA transcriptomic data appear to show larger CD-specific differences in the ileum than in the rectum. Based on this, it was not surprising that the TopGO pathway analysis on DE ncRNAs results showed more significant pathways hits in ileal (n = 136) (Additional file 2: Fig. S3a, b) than rectal biopsies (n = 36) (Additional file 5: Fig. S4a, b) between cases and controls (Additional file 5: Table S4 and Additional file 6: Table S5). However, there were 29 common pathways observed in ileal and rectal gene ontologies, including intracellular transport (GO:0,046,907) in the cellular component category that was annotated by RN7SL2.

Behavior-specific DE ncRNAs in intestinal biopsies

Next, we examined if PC1 of 130 DE ncRNAs that were observed between CD and controls were able to differentiate Crohn’s disease behaviors (B1, B2 and B3). As expected, the PC1 had a potential to differentiate one from the others (Fig. 2a). Therefore, we further extended our analysis based on CD disease behaviors. First, we compared the expression of ncRNAs among individual CD behavior group against controls in both ileal and rectal biopsies and revealed ncRNAs specific to distinct CD behavior groups. We noticed more DE ncRNAs in ileum; B1 (n = 70), B2 (n = 124) and B3 (n = 22) (Fig. 2b, Additional file 7: Table S6) than in the rectal biopsies; B1 (n = 23), B2 (n = 9) and B3 (n = 14) (Fig. 2c, Additional file 8: Table S7).

Fig. 2
figure 2

Crohn’s disease behavior DE ncRNAs. a Using n = 89 DE ncRNAs from CD versus Controls in ileal samples, Principal Components were calculated and the PC1 was used to visualize clustering of Montreal classifiers. b The DE analysis in ileal samples based one Montreal classifications reveals disease behavior specific ncRNAs and the number of DE ncRNAs in each comparison is plotted in Venn diagram c similar comparison in rectal biopsies was performed and the number of DE ncRNAs in each comparison is provided. df Inter-CD Montreal classifiers comparison results are provided in volcano plot, showing significant DE ncRNAs were observed in a pairwise comparison in only ileal biopsies

Similarly, the comparison among CD disease behavior groups from inflammatory (B1) to stricturing (B2) to penetrating (B3) showed an increased pattern of the variance explained 32%, 35%, 45%, respectively (Additional file 2: Fig. S5 a−g). The DE analysis among B1, B2 and B3 showed a similar tread, which is an ileal-centric nature with more DE ncRNAs were observed in ileal B2 versus B1 (n = 35), B3 versus. B1 (n = 13) and B3 versus. B2 (n = 14) than was found in the rectum (Additional file 9: Table S8; Fig. 2c). Interestingly, all DE ncRNAs observed in B3 versus. B1 (Fig. 2e) were also observed in B2 versus B1 (Fig. 2d) and B3 versus B2 (Fig. 2f) comparisons, potentially demonstrating certain CD characteristics that may be present across B1, B2, and B3. Notably, most of them were antisense or lincRNAs types of non-coding elements (Additional file 2: Fig. S6a, b). Similar comparison in rectal biopsies showed no DE ncRNAs to be statistically significant. Thus, our results indicate that a set of ncRNAs in ileal biopsies reflects the Montreal CD disease behaviors, whereas such a pattern was not observed in rectal biopsies of CD patients.

Inflammation and disease location-specific ncRNAs in CD

Inflammation is often a visible hallmark signature of CD, thus we next examined whether the expression of ncRNAs in the inflamed group of CD patients could distinguish them from the non-inflamed groups and furthermore, the location of disease. We used two groups that are assigned by physicians based on i) inflammatory status (inflamed vs. non-inflamed), and ii) disease (inflammation) location such as ileal-centric (L1), colonic (L2), and ileocolonic (L3). We tested whether the PC1 obtained from 130 CD specific DE (Fig. 1) can differentiate inflammatory status and disease location in CD patients. Overall, PC1 obtained from ileal biopsies (Additional file 2: Fig. S7a) showed significant differences between inflamed and non-inflamed/controls samples than rectal biopsies (Additional file 2: Fig. S7b). Especially PC1, as it largely differentiated the CD patients with L1 and L3 ileal inflamed disease locations from controls (P < 2.2E−16), as compared to L2 CD patients with colon inflamed disease location (P < 2.2E−14) or patients with non-inflamed sites (P < 3.5E−08) groups (Fig. 3a).

Fig. 3
figure 3

Crohn’s disease location DE ncRNAs. a Using n = 89 DE ncRNAs from CD versus controls in ileal samples, principal components were calculated and PC1 was used to visualize clustering of inflammation status on different disease location. b DE analysis compared using two groups. One was comprised of: L1 (51) + L3 (147) representing Inflamed, n = 198, versus non-inflamed (20), which resulted in n = 31 DE ncRNAs. The second comparative group consisted of L1 (n = 51) + L3 (n = 147) representing Inflamed, (n = 198), versus non-inflamed (47) (20 + L2 (27), resulting in 21 DE ncRNAs. c The volcano plots representation of DE analysis for both the comparisons. d, e To represent the DE ncRNAs according to location and inflammation status, normalized by log10(FPM) of 31 DE ncRNAs obtained from L1 + L3 ileal inflamed groups versus non-inflamed ileal groups (without L2) in CD patients, are compared in boxplots; similar comparison was made with 21 DE ncRNAs obtained from L1 + L3 ileal inflamed groups versus non-inflamed + L2 ileal groups in CD patients (f, g)

Therefore, we further subjected the CD patients to identify DE ncRNAs specific for inflammation status and location of the disease, which are classified through physician’s clinical assessment. Since this study is primarily focused on CD, where the disease largely occurs in the ileum, we restricted this analysis to only ileal biopsies. In order to identify the ncRNAs specific for disease locations, the L1 and L3 CD patients (n = 198) were combined as the inflamed group and then we compared with non-inflamed ileal CD patients (n = 20) alone and then to non-inflamed + L2 (n = 76) groups together (Additional file 2: Fig. S8a−c), keeping in mind that the L2 CD patients were inflamed only in the colonic location, not in the ileum. Our DE analysis on the ileal biopsies showed 21 DE ncRNAs for L1 + L3 versus non-inflamed (Fig. 3b), and 31 DE ncRNAs for L1 + L3 versus non-inflamed + L2 (Fig. 3c) (Additional file 10: Table S9). A total of 10 DE ncRNAs were shared in both comparisons (Fig. 3d). Likewise, using log normalized FPM (fragments per million) of 21 DE ncRNAs showed better differentiation between the inflamed (L1 + L3) non-inflamed groups (Fig. 3e) rather than the other comparison with 31 DE ncRNAs (Fig. 3f), which incorporated L2 samples into non-inflamed group. Further, the FPM analysis on specific inflammation location showed both L1 and L2 groups as being similar, while L2 and non-Inflamed groups as more closely related (Fig. 3g-h). Using these results in comparing disease inflammation and location status, ncRNA transcriptomic profiles of L2 CD patients were more like CD patients with non-inflamed ileal disease location than inflamed ones (L1, L3).

DE ncRNAs RN7SL2, mir-1244-2,3,4

miRNAs have been observed to regulate multiple facets of gene expression including other non-coding RNAs, and are known to be dysregulated during CD [24], yet the mechanisms remain unclear. Of interest is the down regulation of RN7SL2 by mir-125b to control cell death [25]. In our analysis, we found an increase in the levels of mir-1244-2,3,4 and a decrease in the levels of RN7SL2 in CD versus controls (FDR significance, but not in log2FC) (Fig. 4a). Using IntaRNA to test for molecular interactions amongst RN7SL2 and miRNA-1244-x (2,3,4), we obtained six possible predicted conformations with stable base pairing (Table 3). For two of these possible interactions, one had complementary base pairing for the miRNA-ncRNA interaction at RN7SL2, nucleotides 268–289 with miRNA-1244-2,3,4 at nucleotides 28–48, while the second predicted interaction was RN7SL2, nt 195–199 with miRNA-1244-2,3,4 at nts 8–12. The Δ°G values of these interactions suggest stable binding and schematically represented in Fig. 4b, along with a more realistic molecular model generated by using SRP RNA structures based on ribosomal RNA interactions (Fig. 4c) [26]. Taken together, these results suggest that changes in miRNA levels during CD have physiological impacts that can change cellular function and potentially alter disease outcomes, with RN7SL2 being a potential candidate for targeted therapy.

Fig. 4
figure 4

Hypothesized ncRNA-miRNA Interactions. a The ncRNAs’ RN7SL2, mir1244-2, mir1244-3, and mir1244-4 log2FC in CD versus controls amongst ileal and rectal samples were calculated and combined to generate one heatmap differentiating CD versus Controls by Ward-D clustering. b Using IntaRNA, Freiburg RNA tools, thermodynamics and kinematics of the predicted binding affinities are shown below amongst one target, RN7SL2, a lncRNA, & three miRNAs, mir1244-2, mir1244-3, and mir1244-4: A: [RN7SL2]: ‘268’ -- ‘289’ & [miRNA-1244-x]: [‘28’ -- ‘48’]. The second predicted interaction located at B: [RN7SL2]: ‘195’ -- ‘199’ & [miRNA-1244-x]: [‘8’ -- ‘12’]. c The 3-D conformational location of RN7SL2, often the 7SL 1244-x]: [‘8’ -- ‘12’]. The 3-D conformational location of RN7SL2, often the 7SL component of the SRP (Signal Recognition Particle) is shown

Table 3 Predicted binding affinities of RN7SL2 lncRNA with mir-1244-2, mir-1244-3, mir-1244-4 miRNAs

ncRNA as a potential tool to predict disease status, disease behaviors and disease location in IBD

Lastly, we tested the accuracy of these non-coding elements to predict disease from controls, disease behavior, and disease inflammation in ileal biopsies through RandomForest [27] approach. To test whether ncRNAs serve as a potential index to predict disease status, we used the entire dataset of both CD and CTRLs. The specificity and sensitivity of the modeling showed an average AUC of 0.80 with 84% accuracy, reflecting the robustness of these DEncRNAs to decipher CD versus controls (Fig. 5a). Whereas, in terms of disease behavior, due to our dataset being composed of limited sample size in B2 and B3 when compared to B1, we arbitrarily down sampled the larger dataset in each comparison with respect to the smaller comparative dataset. Therefore, to predict the B2, and B3, from B1, we randomly down sampled B1 to mitigate sample bias. With this, our model predicted B2 from B3 with a mean AUC of 0.84 and 80% accuracy (Fig. 5c), B2 from B1 with 0.72 AUC and 62% accuracy (Fig. 5b), and B3 from B1 with 0.68 AUC and 68% accuracy (Fig. 5d). Likewise, in comparing inflammatory status, the inflamed samples were down sampled with respect to non-inflamed samples. Our model showed better prediction to non-inflamed (without L2) from inflamed with 0.63 AUC and 72% accuracy (Fig. 5e). Interestingly, a poorer prediction was observed when non-inflamed and L2 were tested against inflamed displaying 0.55 AUC and 61% accuracy (Fig. 5f). Details of sample sizes in each comparison and fivefold cross-validation prediction results for disease status, disease behavior and inflammation status are provided in Additional file 11: Table S10, Additional file 12: Table S11, Additional file 13: Table S12, Additional file 14: Table S13, Additional file 15: Table S14, Additional file 16: Table S15.

Fig. 5
figure 5

Predicting ability of ncRNA for disease status, disease behaviors and inflammation location. Using n = 13,777 ncRNAs with at least base mean > 10, we tested to predict the disease status, disease behavior and disease location or inflammation in CD. The average Accuracy and AUC of all 5 cross-validations is denoted on the figure(s). a Predicting ability of ncRNA for disease status, AUC and accuracy was calculated for CD b Predicting ability of ncRNA for disease behavior (B2 from B1), AUC and accuracy was calculated for B1, c Predicting ability of ncRNA for disease behavior (B3 from B1), AUC and accuracy was calculated for B2, d Predicting ability of ncRNA for disease behavior (B3 from B2), AUC and accuracy was calculated for B3, e Predicting ability of ncRNA for disease inflammation (inflamed vs. non-inflamed), AUC and accuracy was calculated for non-inflamed and f Predicting ability of ncRNA for disease inflammation (inflamed vs. non-inflamed + L2), AUC and accuracy was calculated for non-inflamed. A five-fold cross validation (CV) was performed across all the comparisons

Discussion

While changes in ncRNA during CD have been documented [6, 10, 11, 28, 29], little is known about the role of ncRNAs in different location of the intestine. At diagnosis, the location of the diseased mucosal biopsy obtained through colonoscopy often determines diagnosis of IBD subtypes. Inflammation in the ileum is most common for CD whereas rectal inflammation is common for UC. As often is the case however, CD can manifest at multiple locations along the intestine. Using the same IBD patients from the RISK cohort, our previous transcriptomic analysis on the protein-coding genes and a combined analysis with genotypes (eQTL) of both UC and CD in different tissues types, ileum and rectum [30] showed that the transcriptomic and eQTL signatures are distinct to disease characteristics. Here in, we have taken a similar approach and have applied it to profiling ncRNAs in conjunction with location and behavior specific. In doing so we have expanded on previous analysis by including a larger number of CD patients. Importantly, we have shown that ncRNA changes correlated with clinical subtypes originally diagnosed by the physician, and thus it can potentially be applied as a tool to categorize CD disease and corresponding inflamed disease location through ncRNA profiling.

We previously observed that the gene signatures associated with development of complications in CD are ileal specific and often associated with genes involved in producing extracellular matrix (ECM) when progressing from B1 to B2 forms of CD [12]. In our current study, we found the highest degree of DE ncRNA in the ileum, consistent with ileal-centric disease, but also observed multiple DE ncRNAs within the ileum and rectum associated with the ECM. For example, the ncRNA, AC016735.2, has been observed to regulate COLIA1 and COLIA2 whose function involves mediation of collagen organization, a prevalent component of the stromal ECM [31], and we found that AC016735.2 was the most differentially expressed in B1 and B3 disease. Since the intestinal epithelium requires properly functioning ECM in order to establish a functional barrier from luminal contents to prevent adverse immune responses [32], the ability to track early signs of collagen dysregulation by monitoring ncRNA levels may be an important diagnostic tool and potential target for therapy. The involvement of AC016735.2 in other intestinal diseases such as gastric cancers [31], suggests imbalances in this molecules function may play a negative role in IBD outcomes. Thus, by mapping such changes through ncRNA profiling and the location where they are taking place, we have shown a ncRNA profile for CD patients and provided further insights into potential epigenetic sources of disease manifestation i.e. ncRNA dysregulation, within the mucosa.

In a Danish cohort of 213 CD patients, it was observed that individuals with L1: ileal site of disease manifestation and B2: stricturing behavior exhibited the highest risk for surgical intervention [33]. Our results, using CD classifications, show individuals with B2 CD had the lowest correlation coefficient, independent of tissue type, in comparison to B1 and B3. Likewise, L1: ileal, forms of CD also exhibited the lowest correlation coefficients in both the rectal and ileal datasets. Patients with B2-CD and L1: ileal site of disease localization was the most distinct in terms of transcriptomic profiling utilizing ncRNAs. Therefore, we demonstrate that the levels of non-coding genetic elements reflect distinct changes in CD patients that correlate with other clinical indicators and indexes of diagnosis, giving further validity to their potential use as biomarkers.

Consistent with our previous reports [10] and Braga-Neto et al. [34], we detected LINC01272 to be 1.65 × DE in increased quantities, ~ 2.9 log2FC, in CD versus controls, in both tissue types. Across the ileum and rectum samples, multiple DE ncRNAs such as, OVCH1-AS1, RP11-143J24.4, and RP11-184E9.1, showed increased expression in CD versus controls. Notably, RN7SL2 was the only down-regulated DE ncRNA that was observed in both the rectum and ileum displaying significantly lower levels in diseased CD samples. RN7SL2 serves as a 7SL RNA molecule and serves to scaffold the formation of a cytoplasmic ribonucleoprotein complex, Signal Recognition Particle (SRP). This type of non-coding ribonucleoprotein (ncRNP) is conserved across multiple species and the mammalian versions of SRP are comprised of six proteins and one RNA molecule, RN7SL2 [35].

Notably, RN7SL2 or RNSRP2 was downregulated in both the ileum and rectum biopsies of pediatric CD patients in comparison to controls. It was the only ncRNA across the assessment panel to be statistically significant and observed across disease status, CD versus controls, and disease behavior (B1, B2), in both the ileal and rectal mucosa. This suggests that certain non-coding genetic elements associated with CD may be detected across different tissues and sites of disease manifestation with similar DE trends. RN7SL2 functions involve mediation of secretory proteins into the lumen of the endoplasmic reticulum (ER), sometimes via co-translational-insertion [36]. Also, there is an additional conformation of 7SL-RNA (RN7SL2) that forms a propeller-secondary structure with two hairpins being converged by a tetranucleotide bulge loop at its 5’-end that increases its topological efficiency allowing up-regulated activation of pol-III transcription [37]. Thus, changes in this molecule could potentially impact protein processing through the ER, export to the cell surface, vesicle release as microparticles, and even affect the translation of proteins at the ribosome by manipulating the transcription activities of pol-III and its transcribing of rRNA and tRNA. However, how these changes in RN7SL2 levels during CD drives the onset and/or progression of CD still needs further analysis.

Collectively, our Random Forest prediction results show that ncRNAs could be used as a tool to discriminate CD from non-IBD with great accuracy. Nevertheless, our sample sizes are limited for disease behavior and inflammation statuses to achieve a better prediction model. With these results, we believe that clinical disease classifiers further support the utility of non-coding elements as potential biomarkers, prognostic tools, and pharmaceutical targets for therapy.

Taken together, our studies have revealed a clear change in the levels of ncRNAs in the mucosa during distinct phases of CD that correlate well with clinical classifiers. The predominant changes were ileal-specific signatures that likely involved changes in both the ECM and factors that regulate protein function at the ribosome, ER and nucleus. ncRNA changes are thus promising indexes of disease behavior and could potentially serve as therapeutic targets for treatment of distinct stages of CD.

Conclusion

Our study has shown that ncRNA in the intestinal mucosa change during CD in the ileum and the rectum and correlate well with clinical indicators, but that the largest percentage of those changes occurred in the ileal tissues, reflecting the ileal-specific nature of CD. Since these signatures appear to correlate well with severe disease and location, they are most likely strong indicators of disease status. Although it is unclear from our analysis if the changes in ncRNA levels are in fact cellular repair measures or further contributing to mucosal injury, the dysregulated levels of ncRNA in mucosal tissue of CD patients suggests they play a role in CD and might have clinical utility in aiding in early identification and characterization of disease progression.