Extraction of Poly(A) Sites from Large-Scale RNA-seq Data
The NCBI manages the SRA (Sequence Read Archive) database to store RNA-Seq data generated from different NGS technologies. With ever increasing finished and ongoing genome and transcriptome sequencing projects, the data in SRA expand rapidly and present a treasure for mining useful information to facilitate our understanding of biological issues like mRNA 3′-end formation and alternative polyadenylation. We developed a bioinformatics pipeline that can process raw SRA sequence data and obtain high quality poly(A) sites and poly(A) cluster sites with detailed expression information. This pipeline is designed to be generic and can be utilized for polyadenylation studies in any eukaryotic species.
Key wordsPolyadenylation RNA-Seq SRA Poly(A) site Data mining
This project was supported by a grant from the US National Institutes of Health (NIH-AREA) (1R15GM94732-1 A1 to CL and QQL), and by US National Science Foundation (grant nos. IOS–0817829 and IOS-1353354 to QQL).