Journal of Biosciences

, Volume 35, Issue 1, pp 105–118

Flanking region sequence information to refine microRNA target predictions

Article

DOI: 10.1007/s12038-010-0013-7

Cite this article as:
Heikham, R. & Shankar, R. J Biosci (2010) 35: 105. doi:10.1007/s12038-010-0013-7

Abstract

The non-coding elements of a genome, with many of them considered as junk earlier, have now started gaining long due respectability, with microRNAs as the best current example. MicroRNAs bind preferentially to the 3′ untranslated regions (UTRs) of the target genes and negatively regulate their expression most of the time. Several microRNA:target prediction softwares have been developed based upon various assumptions and the majority of them consider the free energy of binding of a target to its microRNA and seed conservation. However, the average concordance between the predictions made by these softwares is limited and compounded by a large number of false-positive results. In this study, we describe a methodology developed by us to refine microRNA:target prediction by target prediction softwares through observations made from a comprehensive study. We incorporated the information obtained from dinucleotide content variation patterns recorded for flanking regions around the target sites using support vector machines (SVMs) trained over two different major sources of experimental data, besides other sources. We assessed the performance of our methodology with rigorous tests over four different dataset models and also compared it with a recently published refinement tool, MirTif. Our methodology attained a higher average accuracy of 0.88, average sensitivity and specificity of 0.81 and 0.94, respectively, and areas under the curves (AUCs) for all the four models scored above 0.9, suggesting better performance by our methodology and a possible role of flanking regions in microRNA targeting control. We used our methodology over genes of three different pathways — toll-like receptor (TLR), apoptosis and insulin — to finally predict the most probable targets. We also investigated their possible regulatory associations, and identified a hsa-miR-23a regulatory module.

Keywords

Bioinformatics genome microRNA non-coding RNA 

Abbreviations used

Ac

accuracy

AUC

area under the curve

FN

false negative

FP

false positive

MCC

Matthew correlation coefficient

ROC

receiver operating characteristic

Sn

sensitivity

Sp

specificity

SVM

support vector machine

TFBS

transcription factor-binding site

TLR

toll-like receptor

TN

true negative

TP

true positive

UTR

untranslated region

VDR

vitamin D receptor

Supplementary material

12038_2010_13_MOESM1_ESM.pdf (878 kb)
Supplementary material, approximately 877 KB.

Copyright information

© Indian Academy of Sciences 2010

Authors and Affiliations

  1. 1.Department of Bioinformatics and Structural BiologyIndian Institute of Advanced ResearchGandhinagarIndia

Personalised recommendations