Background

A microRNA (miRNA), a noncoding RNA which contains about 22 nucleotides, plays a significant role on the regulation of gene expression. By binding to the 3′ untranslated regions (3′UTR) of the target messenger RNA (mRNA), which transfers the genetic information from DNA to the ribosome for protein synthesis in RNA polymerase, miRNA suppresses the translation of the targeting mRNA and/or can activate gene expression under certain conditions [1]. Due to those features and miRNA’s ability to control cell growth and differentiation, people believe that there is an association between miRNA’s deficiency or excess and human diseases [2].

Single nucleotide polymorphism or SNP is a change or mutation at a single position in a DNA sequence. The development of human diseases is also affected by SNP(s) on the basis of the observation that SNPs can cause synonymous and non-synonymous changes which affects amino acid sequence or protein function [3]. However, 3′ untranslated regions (3′UTR) of genes often contain variants which can change binding affinity of targeting miRNAs so that gene expression and protein translation will be affected. It’s important to pinpoint functional important variants in 3′UTRs and elucidate if the genes harboring these variants along with their targeting miRNAs are associated with genetic traits.

In many cancers, the gene PTEN is known to be regulated by miRNAs through competing endogenous RNA networks [4, 5]. Such miRNAs can also have other targeted genes containing SNPs that could contribute to cancers and/or human diseases by changing DNA sequences or influencing gene expression.

Genome-wide association studies (GWAS) has identified and analyzed millions of genetic risk variants which trigger the development of complex diseases [6], and it has reported significant numbers of loci hosting SNPs associated with many traits as well [7]. The disease causativeness annotation for non-coding variants (e.g. 3′UTR variants) identified from GWAS studies and clarification of their effects on miRNAs binding are challenging. MIGWAS (miRNA–target gene networks enrichment on GWAS) is an analytic pipeline to quantitatively evaluate tissue enrichment and elucidate complex biology of the genetic traits over miRNA-target gene networks which provide resources for the genetics of human complex traits, and thus contributes to a deeper understanding of miRNA’s influence on human diseases as well as drug discovery [7, 8].

Composite likelihood-based Covariance regression Network model (CoConet) is a network method for identification of trait-relevant tissues or cell types by incorporating tissue-specific gene co-expression networks [9]. CoCoNet further understands data from GWAS by demonstrating gene co-expression sub-networks which helps predict gene-level association effect sizes on and GWAS traits and diseases. Moreover, CoConet infers trait-relevant tissue based on tissue-specific gene co-expression patterns and proves that tissue-specific gene networks underlie disease etiology [9].

ANNOVAR [10] is an efficient software tool to utilize update-to-date information to annotate genetic variants based on their functional influence. SNPs reported from large scale association studies and sequencing projects can be annotated by ANNOVAR which can be utilized to report functional score and identify variants based on SNPs’ functional influence on genes [10].

The goal of our study is to identify causative SNPs which could change the binding affinity of miRNAs and genes, which could contribute to trait-tissue relevance over miRNA-target gene network.

Methods

We obtained summary statistics in the form of marginal z-scores for 43 traits from 28 GWAS studies [11]. These studies collect a wide range of complex traits and diseases that can be classified into six phenotype categories [12, 13]: anthropometric traits (e.g. height and BMI), hematological traits (e.g. MCHC and RBC), autoimmune diseases (e.g. CD and IBD), neurological diseases (e.g. Alzheimer's disease and Schizophrenia), metabolic traits (e.g. FG and HDL), and social traits (e.g. ever smoked and college completion). We removed SNPs within the major histocompatibility complex (MHC) region (Chr6: 25–34 Mb) following [14]. We then intersected the SNPs from all the studies and retained a common set of 622,026 SNPs for analysis. We paired the marginal z-scores from these studies with the SNP correlation matrix estimated using 503 individuals of European ancestry from the 1000 Genomes Project [15] for inference.

We employed MIGWAS [7], CoCoNet [9], and ANNOVAR [10] software packages to infer trait-relevant tissues based on next-generation sequencing omics data (e.g. GTEx data [16]) and annotated the variants obtained from GWAS and harbored by miRNAs targeted genes associated with traits. We also conducted functional analysis on miRNA targeted genes harboring 3′UTR variants with significant tissue-trait association using annotation tool DAVID [17, 18] for the Primary Biliary Cirrhosis (PBC) trait (Fig. 1).

Fig. 1
figure 1

Workflow of elucidating variants and miRNAs contributed to tissue-trait association

Results

Data acquisition

We obtained 43 traits from various published resources and classified them into six different categories with the total number of samples 3 × 106 (Table 1). Details of the information for the summary statistics of 43 traits from 29 GWAS studies are provided in Additional file 1: Table S1. The table lists the phenotype name, category, abbreviation, number of individuals, reference, and downloaded websites for each of the 43 traits.

Table 1 Total 43 traits and number of samples adopted in the study

MIGWAS Results

We applied MIGWAS to 43 traits (total samples = 3,035,223) and identified biologically relevant tissues. In particular, we identified that 94 out of 180 tissues have significant (p-value < 0.01) association with at least two traits and that 11 traits have significant association with more than 10 tissues (Fig. 2).

Fig. 2
figure 2

Association between tissues and traits identified by MIGWAS for 43 GWAS

We calculated p-value association between traits and all tissues using MIGWAS for 43 GWAS. Interestingly, it reported that PBC trait has the most significant p-value for all tissues (Fig. 3). A detailed association result from MIGWAS is shown in Additional file 2: Table S2.

Fig. 3
figure 3

Association between traits with all tissues identified by MIGWAS for 43 GWAS

We have identified total 332 miRNAs as candidate biomarkers of all traits. The miRNA distribution statistics for all reported traits by MIGWAS is shown in Fig. 4. We found that there are no miRNAs identified through MIGWAS overlapping across significant tissue-traits. We divided 31 traits into two groups that the significant trait group has 15 traits and the non-significant trait group has 16 traits. If the p-value of the trait is less than 0.01 in at least one tissue, it is considered as a significant trait. The non-significant trait Height has 60 miRNAs which is the most among all 31 traits. A list of detailed gene-miRNA pairs reported for all traits is shown in Additional file 3: Table S3.

Fig. 4
figure 4

The miRNA count distribution for all reported traits by MIGWAS (a includes significant traits whose p-value are less than 0.01 in at least one tissue, while b includes non-significant traits

CoCoNet output

We took candidate target genes identified by MIGWAS to retrieve Gene Level Effective Size of each gene (Ensemble gene ID) for the PBC trait. We used CoCoNet to calculate the association between traits and tissues in terms of log likelihood (Fig. 5).

Fig. 5
figure 5

Likelihood values between PBC trait and tissues (red ones have greater loglikelihood based on − 4536.7474 (the log(likelihood) for PBC))

ANNOVAR annotation

We examined the PBC trait showing the smallest association p-value reported by MIGWAS to see if there are any SNPs located in 3′UTR regions of genes targeted by miRNAs. To do so, we took the SNP list for the PBC trait and converted them to ANNOVAR variant annotated input format with ANNOVAR package. We obtained 9,890 annotated variants (5279 genes) exclusively located in 3′UTR region of these genes (Additional file 4: Table S4).

We crossed the 55 genes reported by MIGWAS with 5279 genes annotated by ANNOVAR and obtained 13 genes in common. The detailed information about these 13 genes is shown in Table 2.

Table 2 Annotation information for 13 genes shared by MIGWAS and ANNOVAR

PTEN associated variant analysis

Since PBC is a liver-related trait, we searched the TCGA LIHC dataset [19] and checked anti-correlated pairs for PTEN targeted miRNA: hsa-mir-590. We then used hsa-mir-590 to identify its targeted genes in the anti-correlated pair list and obtained 22 additional target genes. We then identified SNPs using UCSC Genome Browser for ten genes including PTEN with ClinVar variant information and reported the result in a new Additional file 5: Table S5.

Functional analysis

Gene Ontology (GO) analysis on 55 miRNA targeted genes harboring 3′UTR variants with significant tissue-trait association for PBC trait only showed that PFKL is located in a set of genes with term Acetylation (p-value < 0.05). This gene encodes the liver (L) subunit of an enzyme that catalyzes the conversion of d-fructose 6-phosphate to d -fructose 1,6-bisphosphate, which is a key step in glucose metabolism (glycolysis) [20].

The list of genes and their GO terms identified as the top cluster by DAVID is shown in Table 3.

Table 3 Genes and GO terms in the top cluster for 55 genes identified by DAVID

We have also conducted a GO analysis for 22 genes targeted by hsa-mir-590 with target relationship of PTEN. Some enrichment analysis results are reported in Table 4. The detailed DAVID results are reported in Additional file 6: Table S6 and Additional file 7: Table S7.

Table 4 Genes and GO terms targeted by hsa-mir-590 with target relationship of PTEN

Discussions

In our study, we used several cutting-edge bioinformatics tools (MIGWAS, CoCoNet and ANNOVAR) to dissecting SNPs reported for 43 traits from GWAS. The MIGWAS reported miRNA enrichment over target gene network for most (31) of traits. The CoCoNet was adopted to analyze the PBC trait for its significant tissue association over gene co-expression network. The ANNOVAR tool was employed to annotate 3′UTR variants class harbored by a list of miRNA target genes associated with Primary Biliary Cirrhosis (PBC) traits.

Although it may seem like the tools are interrelated and are three separate experiments, they are indeed connected. We first applied the tool MIGWAS to 43 traits and identified significant biologically relevant tissues. When the p-value association between traits and all tissues were calculated by using MIGWAS for 43 GWAS, we found that the PBC trait had the most significant p-value for all tissues. The CoCoNet experiment then utilized some of the results produced by MIGWAS to provide an output. Specifically, for the retrieved PBC trait result from the MIGWAS experiment, we used CoCoNet to analyze the tissues association over gene co-expression network. Again, we used the PBC trait result reported by MIGWAS in another tool: ANNOVAR. We analyzed the PBC trait for SNPs in 3′UTR regions by converting the SNP list into ANNOVAR variant annotated input, and we retrieved 5279 genes located in 3′UTR regions. We then crossed the 55 genes from MIGWAS with the 5279 genes from ANNOVAR and found 13 genes in common. The experiments might seem unrelated, but the results from MIGWAS analyzed by both CoCoNet and ANNOVAR shows that the 3 tools are indeed connected and are not interrelated.

In a previous study, the research found a strong association between PBC and rs231725 [21]. rs231725 is a SNP in the 3′ flanking region of CTLA4, a gene which has an impact on the risk of PBC [21]. Indeed, PTEN is associated with 9 traits (Crohn's disease [22], Inflammatory bowel disease [23], Rheumatoid arthritis [24], Systemic lupus erythematosus [25], Ulcerative colitis [23], Coronary artery disease [26], Type 2 diabetes [27], Alzheimer's disease [28], Autism [29]) out of our analyzed 43 traits.

Total 12 traits (Alzheimer’s disease, Coronary artery disease, Crohn’s disease, Ever Smoked, Fasting glucose, High density lipoproteins, Inflammatory bowel disease, Low density lipoproteins, Schizophrenia, Type 1 diabetes, Triglycerides, Ulcerative colitis) do not have MIGWAS association results reported in the analysis. It seems that liver tissue does not appear to have the smallest likelihood with PBC based on the total genes in human genome. We also ran CoCoNet with the targeted gene set of miRNAs enriched over the target network for PBC trait only to calculate loglikelihood, and the result stays the same.

The transcript expression of PFKL has the highest RPKM (Reads Per Kilobase of transcript, per Million mapped reads) in kidney samples in the RNA-seq which was performed from 4 human individuals in order to determine tissue-specificity of all protein-coding genes [30].

PFKL contains a transcript variant ATP-dependent 6-phosphofructokinase, liver type isoform a which represents the longer transcript and encodes the shorter isoform (a). The CD-Search shows the protein classification of this transcript variant, which is Eukaryotic_PFK domain-containing protein, and finds a specific hit, Eukaryotic_PFK, which shows a high confidence association between the query sequence and a domain model. Phosphofructokinase (PFK) is a key regulatory enzyme that that phosphorylates fructose 6-phosphate in glycolysis. It belongs to the PFK family, evolving from the bacterial PFKs by gene duplication and fusion events and then exhibiting complex behavior. PFK family also includes ATP-dependent phosphofructokinases (allosteric homotetramers) and pyrophosphate (PPi)-dependent phosphofructokinases (mostly dimeric and nonallosteric homotetramers). In addition, protein sequences of Opisthokonta are in the multiple sequence alignment in the Eukaryotic_PFK [30].

We found 13 genes, UGT3A, SYNGR, ASAP3, CD80, GXYLT1, IFNAR, KLHL22, POU2AF1, SFMBT2, SIRPB1, SIRPG, ZC3HAV1 and ZNF585B which are in common in the gene list applied from MIGWAS and ANNOVAR.

ASAP3 encodes a member of a subfamily of ADP-ribosylation factor(Arf) GTPase-activating proteins that its expression level affects cell proliferation and migration, and it called up-related in liver cancer 1 [31]. DDEFL1 (cloned ASAP3) is unregulated in hepatocellular carcinoma (HCC) through microarray analysis. The studies [32] showed that DDEFL1 had increased expression and had overexpression which increased colony formation in NIH3T3 cells and human hepatoma cell lines [32].

In the transcript expression of UGT3A1, liver samples have the second highest RPKM among 27 different tissues [33].

SFMBT2 are polycomb group proteins that bind to methylated lysins in histone tails. The formation of transcription-resistant higher-order chromatin structures at target genes are induced by it so that this gene can repress transcription [34]. SFMBT2 was cloned and designated as KIAA1617 by Nagase et al. [35], who obtained clones from fetal brain cDNA library and then sequenced them [35]. Kuzmin et al. [36] reported that SFMBT2 interacted with the YY1 (600013) transcription when SFMBT2 was expressed from the paternal allele in mouse blastocysts and in mouse embryonic tissues early in development, and later during mouse embryonic development [36].

ZC3HAV1 encodes a zinc finger protein that can prevent replication certain viruses and inhibit viral gene expression by targeting and eliminating viral mRNAs in the cytoplasm [37]. Yu et al. [38] identified ZC3HAV1 and called it ZAP in their analysis of gene expressed in human fetal liver [38]. Northern blot analysis of rat Zap that was cloned by Gao et al. [39] showed that mRNA was highly expressed in kidney and liver [39].

Conclusions

Our study tried to understand GWAS data through identifying candidate miRNA-gene pairs over miRNA-target gene network. Using several cutting-edge bioinformatical tools and databases and adopting visualization, we assessed trait-tissue relevance based on tissue-specific gene co-expression information including protein–protein interaction and transcription factor binding evidence. We annotated 3′UTR variants harbored by genes targeted by miRNAs expressed in tissues significantly associated with PBC trait. Our study provides evidence that the association between tissues and traits could be affected by the 3′UTR variants of genes which change binding affinity of targeting miRNAs. The analysis emphasizes the influence of variants on genetic traits and miRNA-targeting gene networks, and thus could contribute to additional studies and detections on human diseases. We think our study provided a valuable approach for elucidating 3′UTR variants which could contribute to genetic traits with tissue relevance in the context of miRNA influential human diseases.