Detecting microRNAs of high influence on protein functional interaction networks: a prostate cancer case study
- First Online:
- Cite this article as:
- Alshalalfa, M., Bader, G.D., Goldenberg, A. et al. BMC Syst Biol (2012) 6: 112. doi:10.1186/1752-0509-6-112
- 5.9k Downloads
The use of biological molecular network information for diagnostic and prognostic purposes and elucidation of molecular disease mechanism is a key objective in systems biomedicine. The network of regulatory miRNA-target and functional protein interactions is a rich source of information to elucidate the function and the prognostic value of miRNAs in cancer. The objective of this study is to identify miRNAs that have high influence on target protein complexes in prostate cancer as a case study. This could provide biomarkers or therapeutic targets relevant for prostate cancer treatment.
Our findings demonstrate that a miRNA’s functional role can be explained by its target protein connectivity within a physical and functional interaction network. To detect miRNAs with high influence on target protein modules, we integrated miRNA and mRNA expression profiles with a sequence based miRNA-target network and human functional and physical protein interactions (FPI). miRNAs with high influence on target protein complexes play a role in prostate cancer progression and are promising diagnostic or prognostic biomarkers. We uncovered several miRNA-regulated protein modules which were enriched in focal adhesion and prostate cancer genes. Several miRNAs such as miR-96, miR-182, and miR-143 demonstrated high influence on their target protein complexes and could explain most of the gene expression changes in our analyzed prostate cancer data set.
We describe a novel method to identify active miRNA-target modules relevant to prostate cancer progression and outcome. miRNAs with high influence on protein networks are valuable biomarkers that can be used in clinical investigations for prostate cancer treatment.
KeywordsMiRNAProtein interactionsSystems biologyHigh-influence miRNA
A major challenge in biomedical research is to understand the underlying mechanisms of human disease. Great effort has been spent on determining genes associated with human diseases. However, most human diseases, and cancer in particular, cannot be attributed to single gene but arise due to complex interactions among multiple components of the cell, including genes, proteins, and miRNAs. miRNAs are a large family of gene regulators, found in both plants and animals, which impact gene activity by binding to the 3’UTR of target mRNAs leading to mRNA degradation or translational inhibition[2, 3]. Though miRNAs are only 18-22 nucleotides, each can control the expression of hundreds of genes. It is estimated that approximately half of the human genome is regulated by miRNA-mediated gene control. miRNAs play a key role in regulating diverse cellular functions, such as development, proliferation, apoptosis, and metabolism and are associated with a growing list of diseases including cancer[5, 6]. An increasing body of evidence suggests that miRNAs impact gene expression in many cancer types including prostate cancer[7–9]. Several studies have investigated the role of miRNAs in cancer using mRNA and miRNA expression profiling[3, 10]. Better understanding the regulatory role of miRNAs in cancer development and progression requires exploring their influence on other components of the cellular system they are a part of. Doing so, may lead to identifying predictive biomarkers and developing novel therapeutic strategies for cancer.
Current major challenges in miRNA research are prediction and experimental validation of miRNA-target interactions, and determination of the functional role of miRNAs. Computational prediction of miRNAs is challenging in human genomes because of the imperfect pairing of the miRNA with the corresponding target site. Several factors can influence miRNA-mediated gene control, like 3’UTR length, number of miRNA targets sites, degree of complementary match, amount of target mRNA[12, 13], and the competition for targeted mRNA. Unfortunately, current sequence based predictions produce many false positive interactions and many of the predicted interactions may not be functional, which means there may be no relationship between the expression levels of the mRNA and the predicted targeting miRNA. Several studies have tried to solve this by integrating gene expression data with sequence-based prediction to remove non-functional interactions and keep interactions that show negative correlation between miRNA and their targets[10, 16]. Thus, sequence-based methods provide a general view of the potential miRNA targets but expression data or other cellular context information is required to more accurately predict miRNA-target interactions.
Determining the role of individual miRNAs in cellular regulatory processes is still a major challenge. The function of many miRNAs remains unknown, and even for relatively well studied miRNAs, only a handful of their targets have been characterized[17, 18]. Delineating miRNA function through knock-out and overexpression experiments in model organisms has had limited success, possibly because of functional redundancy among miRNAs or among gene pathways regulated by miRNAs. A miRNA downregulates its targets, thus negative correlation in expression levels between a miRNA and its direct targets indicates that the miRNA is functional. Several studies have attempted to extract miRNA-target modules based on the correlation between miRNAs and targets and based on graph theory. However, these results are complicated by indirect effects - a single miRNA may target many mRNA targets that may influence other genes, thus negative correlation between miRNA and targets does not indicate a direct interaction between miRNA and target.
Interactions between miRNA and targets are not solely dependent on the 3’UTR of the target, but depend on what other competing 3’UTR targets are expressed in a given cellular context. Limited attempts have been made to investigate the impact of miRNAs on protein interactors of the target. It has been shown that protein-protein interaction (PPI) network topological features help to filter out false positive targets, and help to prioritize miRNAs in prostate cancer. Recent evidence showed that some protein complexes are enriched with single miRNA targets and some complexes are enriched with miRNA cluster targets. For example, SMAD3-SMAD4-FOXO3 complex is enriched with miR-1284 targets, and MAD1-SIN3A-HDAC2 complex is enriched with targets of the miR-510-514 and miR-1912-1264 clusters. Other studies demonstrated that PPI context of miRNA targets provides more representative information about miRNA function compared to using only direct targets. Direct targets of miRNAs and their partners jointly showed higher modularity levels compared with miRNA direct targets alone. Analyzing properties of miRNA targets is a promising approach to miRNA function prediction. mirPath is a computational tool developed to identify molecular pathways enriched in miRNA targets set. mirPath extracts miRNA targets from other tools such as TargetScan, PITA, and then miRNA function is predicted by assessing whether the predicted targets of a given miRNA are enriched for particular functional annotations. Such enrichment based methods suffer from several limitations. First, they solely depend on the miRNA-target prediction algorithms that are noisy. Second, predicted miRNA targets are usually large (hundreds to thousands of genes) and this leads to heterogeneous functional annotations that make it difficult to gain high confidence predictions. Integrating expression data is a promising approach to reduce noise in enrichment results. The miRNA body map is a web tool developed for miRNA functional annotation in normal and diseased human tissues that integrates expression data to reduce heterogeneity in functional annotations. FAME is another tool with three main applications in the area of miRNA functional analysis. Firstly, it infers miRNA function directly using sets of genes sharing common annotations and secondly, infers miRNA function indirectly using matched mRNA/miRNA expression data. Thirdly, FAME predicts the function of genomic clusters of miRNAs. Integrating the protein context of miRNA targets is another promising dimension for miRNA function prediction. miRUPnet is another miRNA function prediction framework that predicts miRNA function based on the upstream context of miRNA and not downstream. It infers the miRNA function by functionally analyzing the context of its transcription factors in a protein-protein interaction network. Using information about TFs upstream of a miRNA results in the discovery of additional biological processes not seen in miRNA targets (downstream). These observations shed light on the influence of miRNAs on the PPI subnetwork involving the targets, and highlight the importance of considering target protein interactors when searching for functional miRNA-target interactions.
In the post-genomics era, a crucial task in molecular biology is to understand miRNA regulation in the context of biological networks. Since miRNAs target proteins that are part of either protein complexes or signaling pathways, it is important to study the influence of miRNAs on protein networks in disease progression. Characterizing the role of miRNAs in the context of protein networks has emerged recently in several studies[25, 32–34]. By analyzing the interactions between miRNAs and cellular signaling networks, miRNAs were found to predominantly target proteins of the same signaling pathway and target highly connected scaffolds and most downstream network components such as signaling transcription factors. miRNAs were also found to less frequently target upstream components of the signaling pathways like membrane receptors and ligands. Hsu et al demonstrated that many miRNA-targeted genes are hub proteins and bottleneck proteins in protein interaction networks (PPIN) and thus have higher betweeness centrality. When these hub or bottleneck proteins are repressed by individual or multiple miRNAs, they may consequently influence large part of the interacting proteins and thus control key components of the PPIN. Their analysis showed that the target proteins of individual miRNAs tend to interact with more proteins than other non-miRNA targets. Positive correlation between protein connectivity (degree in PPIN) and the number of miRNAs targeting the corresponding protein has been observed by Liang and Li. This means that proteins with large numbers of partners in the PPIN network need more miRNAs to control their expression. miRNA induced influence can propagate in the regulatory network by targeting master transcription factors. Cui et al found that 42% of 9348 gene that are regulated by TFs, are miRNA targets, and the average TF binding site count of miRNA targets is significantly higher than that of non miRNA targets. This suggests that gene expression regulation by miRNAs at the post-transcriptional level is coordinated with that of TFs at the transcriptional level and genes targeted by more miRNAs have more TF binding sites.
In this work we introduce a new method to characterize miRNA function based on its effect on the expression of the target and its neighbors in a functional interaction network. Unlike previous methods that weight miRNA-target interactions based on sequence complementarity or gene expression correlation alone, we estimate the overall influence of a miRNA on its target based on the target gene expression level and the gene expression levels of the interaction neighborhood of the target. miRNAs with high influence are validated using independent miRNA expression datasets, and by analyzing the biological pathway enrichment of target protein modules. We then used our miRNA-target influence network to predict the overall influence of each miRNA on individual prostate cancer patients to find those miRNAs associated with aggressive cancer. We show that miRNAs with high influence on protein complexes and biological processes are likely involved in cancer progression and have potential prognostic significance.
Human miRNA target predictions for miRNA with conserved 3’UTR were taken from TargetScan 5.1, and experimentally validated miRNA and their targets were taken from mirTarBase and miRecord. We used the union of mirTarBase and miRecord as a source of experimentally validated miRNA-target interactions(3976 interactions between 345 miRNA and 2277 gene).
Functional protein interaction (FPI) networks
We used combined undirected functional protein interactions (FPI) as described in. FPI includes annotated functional protein interactions from Reactome, Panther, CellMap, BioCarta, KEGG and TRED, and includes interactions derived from physical protein interaction, co-expression data, domain-domain interaction data. FPI was constructed using a naive Bayes classifier (NBC) to distinguish high-likelihood FIs from non-functional pairwise relationships. We also used physical protein interactions from the HPRD database. FPI functional interaction network includes HPRD interactions, but the two networks have distinct topological features. We also used another curated human signaling network from Cui et al.
miRNA and mRNA expression data
We used mRNA and miRNA expression data from the MSKCC Prostate Oncogenome Project (Taylor data)that is available at the Gene Expression Omnibus (GEO accession number: GSE21032). The data contains expression levels of 26443 genes across 179 samples (131 primary cancer, 19 metastatic, and 29 normal samples), and expression of 370 miRNAs across 140 samples. We used the expression data of 139 samples with both mRNA and miRNA data for our analysis. To validate the miRNA results we obtained using the Taylor data, we used localized prostate cancer miRNA expression data from independent prostate patient cohort (GSE23022) and prostate cell lines (NCI60).
miRNA-target influence(miRTI) network construction
p(d,r) is the joint probability density function(pdf) of miR and t, and P(d) and p(r) are the marginal pdf’s of miR and t respectively.
We propose that the influence of a miRNA (miR) on its target(t) depends on three variables. First is the strength of the negative correlation between miRNA and target expression profiles. CorrmiR(miR,t) = MI(miR,t), if Seq(miR,t) = 1, and CorrmiR(miR,t) = 0, if Seq(miR,t) = 0. We only considered miR and t pairs with negative Pearson’s correlation and Seq(miR,t) = 1. This step is needed to filter out miRNA-target pairs with high MI due to positive correlation.
Second is the direct impact of the miRNA on the expression of the partners of the target. We calculated mutual information (MI) between the expression profiles of each target and its FPI partners (CorrFPI), where CorrFPI(ti,tj) = MI (ti,tj) if ti is linked to tj in FPI, and CorrFPI(ti,tj) = 0 otherwise. We used the maximum MI between the target and its partners to represent the direct influence of miRNA on the target partners.
Third is the indirect influence of miRNA on the expression of the target through its partners. The indirect impact of a miRNA on its target through its partners is defined as W , where where k is the partners of t in FPI.
Using the miRNA-target influence(miRTI) network to measure the influence of miRNAs on prostate cancer progression
is the elastic-net penalty. Pα is a compromise between the ridge regression penalty (α = 0), and the lasso penalty (α = 1). In this model we set α to be 0.5 as a middle value between (α = 0)(ridge regression) and (α = 1) (lasso regression). Setting (α = 0.5) will reduce sparsity achieved using lasso regression while still panelizing correlated predictors. This penalty is particularly useful when there are many correlated predictor variables as in the case of miRNAs. We tried several values of α and we saw that when α changes from 0 to 0.5 or 1, it dramatically changes the minimum λ value and the number of non-zero values in the solution. Setting α to 0 produced large number of non-zero values in the solution (113) and α = 1 produced small number of non-zero values (11) that lead to a very sparse solution that might affect predictors of small effect on the response. Setting α to 0.5 leads to a solution of medium sparsity with 31 non-zero elements (Additional file1: Figure S1). However, changing α around 0.5 (0.3-0.7) does not change minimum λ value a lot (difference is 0.01) which does not impact number of non-zero elements. We have selected α to be 0.5 because when it is 0.5, the minimum λ leads to the minimum MSE. β is the regression coefficient of each variable, which indicates how the expression level of each miRNA can explain the gene expression profile of prostate cancer samples. λ is a factor that determines the sparsity of the solution, as λ increases, the number of nonzero components of β decreases. In this study, we selected 100 values of λ and used those that minimize the mean square error. More details on λ optimization with respect to α is shown in Additional file1: Figure S1. Elastic-net regression was fit using ten-fold cross validation. We used glmnet package available athttp://www-stat.stanford.edu/tibs/glmnet-matlab/ to solve the regression model. For each patient we predicted the influence of the miRNA set on the patient’s gene expression profile. Figure1B describes the steps to construct the input of the model and its output. The resulting miRNA-patient influence profile was used to associate a miRNA with a sample’s outcome.
Detection of transcriptional activity centers in prostate cancer
Where N is the total number of neighbor genes.
Using the miRNA-target influence(miRTI) network to identify miRNA influence on genes with high activity center scores
The output of this model is a coefficient for each miRNA that represents how each miRNA explains the expression activity score of the genes.
Global correlation between functional protein network topology and miRNA regulation
Correlation between protein network structure and miRNA activity
Prostate cancer miRNAs target functionally associated genes
The FPI network helps reveal miRNA-target modules that play a functional role in prostate cancer
Since TargetScan and FPI interaction data are noisy, we repeated the experiments using a highly curated human signaling network and curated miRNA-target interactions from miRecord and miRTarBase. The predicted interaction network (Additional file1: Figure S8) was found to be modular and partners of miRNA targets are found in highly dense network regions. 31 miRNAs were identified to have high influence on protein signaling network. 27 of them are in the list of 54 prostate miRNAs (p = 0.0001) we collected from literature (Additional file1: Table S1). Although the topological structure of the target modules are not very similar to modules in Figure5 due to difference in the protein networks and miRNA-target interactions used to find them, the miRNA targets’ partners are modular and form subnetworks of potential dysregulated proteins. 13 of the 31 miRNAs predicted to have high influence on signaling network were in common with the 70 miRNAs predicted to have high influence on the FPI network (p = 0.0003). These results suggest that the completeness of protein interactions network plays a crucial role to identify high influence miRNAs.
Functional miRNA-target modules are prognostic biomarkers that help identify patients with aggressive tumors
Patient specific miRNA influence helps predict cancer recurrence
Identification of miRNAs of high-influence on principal regulators
Since tumor heterogeneity affects the identification of robust cancer biomarkers, Li et al. found that most cancer gene signatures are not robust and not reproducible. Thus they proposed a re-sampling based framework to identify robust cancer biomarkers. In this work we asked the question whether re- sampling might have an effect on the ActivityScore profile. To answer this question, we used Significance Analysis of Microarray (SAM) that is based on re-sampling to identify differentially expressed genes and then generate an ActivityScore profile using SAM results. We repeated SAM analysis 100 times; each time we change the permutation number and generated an ActivityScore profile. The resulting profiles demonstrated very significant correlation (R2 = 0.9996) which indicates that re-sampling does not affect the ActivityScore and that identified activity centers are robust and reproducible within our data set.
Prostate cancer is one of the most commonly diagnosed malignant tumors in aged men in North America. miRNAs that are a family of regulatory molecules are significantly altered in prostate cancer. However, miRNA’s mode of action and how the influence of prostate miRNAs on target expression is involved in prostate cancer progression is not well known. Over- or under-expression of specific miRNAs in different tumors makes them potential therapeutic targets and diagnostic or prognostic biomarkers; however, miRNAs that are differentially expressed and influence their targets and target partners are important regulators and thus are more promising for diagnostics, prognostics or therapy.
In this work we use functional protein interactions to identify miRNAs with high influence on targets and their partners. We hypothesize that miRNAs that influence a large number of interacting proteins are more important than those that only affect a few proteins. We first showed that proteins that are highly connected have more regulating miRNAs compared to those with low connectivity. Thus, identifying miRNAs that regulate highly connected proteins is important to understand how to control propagation of gene expression changes via miRNAs. We showed that miRNAs that have been experimentally verified to play a role in prostate cancer target functionally related genes. This motivated us to investigate how miRNAs that have high influence on protein partners of the target genes help us to better understand prostate cancer. In this work we bridge a gap between systems biology and clinical biology by investigating the association between miRNAs that have high influence on the system with the outcome of the system.
We built a miRNA-target influence network (miRTI) by following miRNA influence of expression in prostate cancer of downstream genes in the FPI network and then proposed three applications of this network. First, we used it to identify miRNA target functional modules and complexes. This revealed miRNAs with high-influence on the target FPI neighborhood, which suggests that these miRNA are important in prostate cancer. The difference between high-influence miRNAs and differentially expressed miRNAs is that high-influence miRNAs are differentially expressed and have differentially expressed targets and target interaction neighbors. Validating both miRNA and targets in the functional modules against independent miRNA expression datasets from prostate indicates that they are robust prostate cancer diagnostic biomarkers. Analyzing functional modules of miRNA targets revealed several results. First, target genes are enriched in prostate cancer and focal adhesion pathways, which may help explain the progression and metastasis process as our data includes metastatic samples. Functional modules are also of prognostic significance as they were associated with cancer recurrence and cancer specific death. Moreover, miRTI network (Figure4) revealed that some proteins like BTBD 7,ANK 2,COL 12A 1 are highly repressed by several miRNAs. On the other hand, some miRNAs (miRNA-96, miRNA-182, miRNA-1) are highly influential on target partners as they regulate several connected proteins. This suggests that miRNAs have different mode of actions based on their influence on the expression of the target neighborhood. This might help to define new regulatory classes of miRNAs based on their mode of action.
The second application of miRTI is to predict patient-specific miRNA influence by using a regression model. In this application we used the miRTI network to predict the gene expression profile of the patients (PCs). As a result of the regression model, we predict miRNA-PCs network that shows how much each miRNA explains the gene expression profile of a patient based on the weight with which it affects its targets. We applied the regression model on all patients and generated a matrix that represents the influence of each miRNA on each patient. Based on this miRNA influence matrix we were able to group patients into aggressive and low risk cancer patients. Comparing the miRTI with the Seq network demonstrated that using miRNA-target influence interactions gives more knowledge about miRNA mode of action than using the binary Seq weights that are based on only sequence predictions. This result supports our initial conclusion that considering the downstream effect of miRNA on protein partners of target is useful and has prognostic value. We realized that both grouping patients based on miRNA gene expression and based on patient-specific miRNA influence from miRNA-PCs network result in putting high risk patients in one group and low risk patients in the other group. This indicates that the influence of each miRNA on each patient is represented in the mRNA expression of the patient. The availability of differential miRNA and mRNA expression profiles from the same cancer samples enable functional analysis of miRNAs in cancer, but there are few cancer cohorts that have expression levels of miRNA and mRNA from the same sample. Thus this result is very promising to predict the expression of miRNAs in patients and predict their outcome without performing miRNA expression profiling.
The third application of the miRTI network is to predict miRNAs with high-influence on genes with high activity center scores (highly active network neighborhoods). The ActivityScore profile of prostate cancer summarizes the activity of module proteins rather than the activity of single genes as in the second application. Here the miRTI is used to predict the ActivityScore using the regression model. The results emphasized the role of some miRNAs already validated in prostate cancer (miR-221, miR-222, mir-96 and mir-143), and identified novel miRNAs like miR-210, miR-542, miR-128 and miR-219 that do not have a known mode of action in prostate cancer. This means that these miRNAs could be as important as the already validated miRNAs, and could explain the summarized activity of the gene modules. miRNAs identified using the miRTI and Corrmir networks overlap; both networks identified miR-182 and miR-96 as important miRNAs. The advantage of using miRTI over Corrmir, Seq and W to identify miRNA influence on target partners or on patient gene expression is that it produces two types of modules, unlike W that favors the first type of modules and Corrmir that favors the second type of module. Modules identified by our approach includes miRNAs like miR-96 and miR-182 targeting highly interacting proteins, and miRNAs like miR-1, and miR-205 that target non-interacting complexes.
miRNAs have been associated with clinical variables, prostate cancer recurrence and prostate cancer-specific death. However, the association between miRNAs that target protein modules vs. clinical and survival data has not been well studied. Recent evidence showed that low miR-1 in human prostate tumors is associated with early disease recurrence, and elevated levels of miR-96 is associated with high Gleason score and higher risk of biochemical relapse. In this work we showed that miRNAs identified using the miRTI method are associated with cancer recurrence (Figure7). Also, we showed that patient-specific miRNA influences predicted using miRTI are better prognostic biomarkers compared with binary, non-weighted miRNA-target interactions. This indicates that there is a link between the influence of miRNA on target partners and its influence on outcome, but more analysis on larger cohorts and biological experiments are required to prove this result.
Comparing the three applications of miRTI revealed consistent results. They all indicate the significant role of specific miRNAs (miR-221, miR-222, miR-210, miR-542-5p, miR-96, miR-182, and miR-143) in prostate cancer. For instance, miR-96 and miR-182 are members of the same gene cluster and thus this supportes the effectiveness of integrating protein networks to identify miRNAs with similar mode of action. ActivityScore functional analysis indicates that zinc-finger proteins, zinc homeostasis, focal adhesion, and Wnt signaling are enriched in genes with high ActivityScore (p-value < 1 × 10−10). Evidence showed that zinc homeostasis is regulated by the miR-96-183-182 cluster. This is in agreement with our results that demonstrate that miR-96 and miR-182 explain most of the genes ActivityScore that is significantly enriched in zinc homeostasis. Other predicted miRNAs (miR-143, miR-542) may play a role in zinc homeostasis, focal adhesion, and cytoskeleton organization.
The large scale protein interactions and miRNA target prediction data we used were useful to help elucidate the mechanistic role of miRNAs in disease progression. Although the interaction datasets are far from complete and suffer from noise, our results were consistent across choice of PPI network. Using additional protein interaction networks, different miRNA target prediction algorithms, and different expression data sets will likely reveal more miRNAs with high-influence on cancer progression. Another future direction for this work is designing a systematic method to combine the three variables that determine the influence of miRNAs on the target partners.
Finally, this study on bridging the gap between clinical bioinformatics and network-based biomarkers provides clear evidence that protein interaction information is useful to identify diagnostic and prognostic cancer biomarkers, and to ameliorate the understanding of the functional mechanisms of miRNAs.
We have developed a novel method to identify active miRNA-target modules relevant to prostate cancer progression and outcome. miRNAs with high influence on protein networks are valuable biomarkers that can be used in clinical investigations for prostate cancer treatment. Combining the effects of miRNAs on targets and target partners provides better understanding of miRNAs function.
M. Alshalalfa and R. Alhajj would like to thank NSERC for funding.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.