Poly(A)-Tag Deep Sequencing Data Processing to Extract Poly(A) Sites

  • Xiaohui WuEmail author
  • Guoli Ji
  • Qingshun Quinn Li
Part of the Methods in Molecular Biology book series (MIMB, volume 1255)


Polyadenylation [poly(A)] is an essential posttranscriptional processing step in the maturation of eukaryotic mRNA. The advent of next-generation sequencing (NGS) technology has offered feasible means to generate large-scale data and new opportunities for intensive study of polyadenylation, particularly deep sequencing of the transcriptome targeting the junction of 3′-UTR and the poly(A) tail of the transcript. To take advantage of this unprecedented amount of data, we present an automated workflow to identify polyadenylation sites by integrating NGS data cleaning, processing, mapping, normalizing, and clustering. In this pipeline, a series of Perl scripts are seamlessly integrated to iteratively map the single- or paired-end sequences to the reference genome. After mapping, the poly(A) tags (PATs) at the same genome coordinate are grouped into one cleavage site, and the internal priming artifacts removed. Then the ambiguous region is introduced to parse the genome annotation for cleavage site clustering. Finally, cleavage sites within a close range of 24 nucleotides and from different samples can be clustered into poly(A) clusters. This procedure could be used to identify thousands of reliable poly(A) clusters from millions of NGS sequences in different tissues or treatments.

Key words

Polyadenylation site Next-generation sequencing Genomic data Poly(A) clusters Bioinformatic processing PAT-seq 



Funding supports for this work were from the National Natural Science Foundation of China (Nos. 61174161 and 61304141), the Natural Science Foundation of Fujian Province of China (No. 2012J01154), the specialized Research Fund for the Doctoral Program of Higher Education of China (Nos. 20130121130004 and 20120121120038), and the Fundamental Research Funds for the Central Universities in China (Xiamen University: No. 2013121025), Xiamen Shuangbai Talent Plan (to QQL), and US National Science Foundation (grant nos. IOS–0817829 and IOS-1353354 to QQL).


  1. 1.
    Xing D, Li QQ (2011) Alternative polyadenylation and gene expression regulation in plants. Wiley Interdiscip Rev RNA 2(3):445–458. doi: 10.1002/wrna.59 PubMedCrossRefGoogle Scholar
  2. 2.
    Shen Y, Ji G, Haas BJ, Wu X, Zheng J, Reese GJ, Li QQ (2008) Genome level analysis of rice mRNA 3′-end processing signals and alternative polyadenylation. Nucleic Acids Res 36(9):3150–3161PubMedCentralPubMedCrossRefGoogle Scholar
  3. 3.
    Tian B, Hu J, Zhang HB, Lutz CS (2005) A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res 33(1):201–212. doi: 10.1093/nar/gki158 PubMedCentralPubMedCrossRefGoogle Scholar
  4. 4.
    Wu X, Liu M, Downie B, Liang C, Ji G, Li QQ, Hunt AG (2011) Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for extensive alternative polyadenylation. Proc Natl Acad Sci U S A 108(30):12533–12538. doi: 10.1073/pnas.1019732108 PubMedCentralPubMedCrossRefGoogle Scholar
  5. 5.
    Shen Y, Venu RC, Nobuta K, Wu X, Notibala V, Demirci C, Meyers BC, Wang G-L, Ji G, Li QQ (2011) Transcriptome dynamics through alternative polyadenylation in developmental and environmental responses in plants revealed by deep sequencing. Genome Res 21(9):1478–1486. doi: 10.1101/gr.114744.110 PubMedCentralPubMedCrossRefGoogle Scholar
  6. 6.
    Ma L, Pati PK, Liu M, Li QQ, Hunt AG (2014) High throughput characterizations of poly(A) site choice in plants. Methods 67(1):74–83. doi: 10.1016/j.ymeth.2013.06.037 PubMedCrossRefGoogle Scholar
  7. 7.
    Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25. doi:  10.1186/gb-2009-10-3-r25 PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Shen Y, Liu Y, Liu L, Liang C, Li QQ (2008) Unique features of nuclear mRNA poly(A) signals and alternative polyadenylation in Chlamydomonas reinhardtii. Genetics 179(1):167–176PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of AutomationXiamen UniversityXiamenChina
  2. 2.Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and EcologyXiamen UniversityXiamenChina
  3. 3.Department of BiologyMiami UniversityOxfordUSA

Personalised recommendations