Poly(A)-Tag Deep Sequencing Data Processing to Extract Poly(A) Sites
Polyadenylation [poly(A)] is an essential posttranscriptional processing step in the maturation of eukaryotic mRNA. The advent of next-generation sequencing (NGS) technology has offered feasible means to generate large-scale data and new opportunities for intensive study of polyadenylation, particularly deep sequencing of the transcriptome targeting the junction of 3′-UTR and the poly(A) tail of the transcript. To take advantage of this unprecedented amount of data, we present an automated workflow to identify polyadenylation sites by integrating NGS data cleaning, processing, mapping, normalizing, and clustering. In this pipeline, a series of Perl scripts are seamlessly integrated to iteratively map the single- or paired-end sequences to the reference genome. After mapping, the poly(A) tags (PATs) at the same genome coordinate are grouped into one cleavage site, and the internal priming artifacts removed. Then the ambiguous region is introduced to parse the genome annotation for cleavage site clustering. Finally, cleavage sites within a close range of 24 nucleotides and from different samples can be clustered into poly(A) clusters. This procedure could be used to identify thousands of reliable poly(A) clusters from millions of NGS sequences in different tissues or treatments.
Key wordsPolyadenylation site Next-generation sequencing Genomic data Poly(A) clusters Bioinformatic processing PAT-seq
Funding supports for this work were from the National Natural Science Foundation of China (Nos. 61174161 and 61304141), the Natural Science Foundation of Fujian Province of China (No. 2012J01154), the specialized Research Fund for the Doctoral Program of Higher Education of China (Nos. 20130121130004 and 20120121120038), and the Fundamental Research Funds for the Central Universities in China (Xiamen University: No. 2013121025), Xiamen Shuangbai Talent Plan (to QQL), and US National Science Foundation (grant nos. IOS–0817829 and IOS-1353354 to QQL).
- 5.Shen Y, Venu RC, Nobuta K, Wu X, Notibala V, Demirci C, Meyers BC, Wang G-L, Ji G, Li QQ (2011) Transcriptome dynamics through alternative polyadenylation in developmental and environmental responses in plants revealed by deep sequencing. Genome Res 21(9):1478–1486. doi: 10.1101/gr.114744.110 PubMedCentralPubMedCrossRefGoogle Scholar