Abstract
Each day, more and more transcripts are being discovered along the genome (especially in poorly annotated species) thanks to the rapid progress of high-throughput technology such as RNA sequencing. However, this situation unravels the challenge of how to classify the newly identified transcripts into protein coding or noncoding. Here, we describe a de novo approach named coding–noncoding index (CNCI), a powerful signature tool by profiling adjoining nucleotide triplets (ANT) to effectively distinguish between protein-coding and noncoding sequences independently of known annotations. The main advantage of CNCI is its ability to accurately classify transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, which allowed it to be used for all vertebrates and invertebrates based on the training data of well-annotated species (such as human and Arabidopsis). In this chapter, we illustrate the CNCI method in detail through an example of RNA-sequencing data generated from six biological replicates of six mouse tissues. CNCI software is available at http://www.bioinfo.org/software/cnci.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Liao Q, Liu C, Yuan X et al (2011) Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network. Nucleic Acids Res 39:3864–3878
Bu D, Yu K, Sun S et al (2012) NONCODE v3.0: integrative annotation of long noncoding RNAs. Nucleic Acids Res 40:D210–D215
Bernstein BE, Birney E, Dunham I et al (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74
Ozsolak F, Milos PM (2011) RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 12:87–98
Kong L, Zhang Y, Ye ZQ et al (2007) CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res 35:W345–W349
Lin MF, Jungreis I, Kellis M (2011) PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27:i275–i282
Guttman M, Donaghey J, Carey BW et al (2011) lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 477:295–300
Guttman M, Amit I, Garber M et al (2009) Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458:223–227
Sun L, Luo H, Bu D et al (2013) Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res 41:e166
Derrien T, Johnson R, Bussotti G et al (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22:1775–1789
Pruitt KD, Tatusova T, Brown GR et al (2012) NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res 40:D130–D135
Guttman M, Garber M, Levin JZ et al (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28:503–510
Trapnell C, Williams BA, Pertea G et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this protocol
Cite this protocol
Luo, H., Bu, D., Sun, L., Chen, R., Zhao, Y. (2014). De Novo Approach to Classify Protein-Coding and Noncoding Transcripts Based on Sequence Composition. In: Alvarez, M., Nourbakhsh, M. (eds) RNA Mapping. Methods in Molecular Biology, vol 1182. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-1062-5_18
Download citation
DOI: https://doi.org/10.1007/978-1-4939-1062-5_18
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-1061-8
Online ISBN: 978-1-4939-1062-5
eBook Packages: Springer Protocols