Pseudogenes pp 157-183 | Cite as

Methods to Detect Transcribed Pseudogenes: RNA-Seq Discovery Allows Learning Through Features

Part of the Methods in Molecular Biology book series (MIMB, volume 1167)


The detection of transcripts and the measurement of their associated activity at the pseudogene scale have recently become important topics of research. Being integral part of many recent studies aimed at establishing a role for a variety of noncoding RNA structures, pseudogenes’ popularity has substantially increased due to the discovery of regulatory properties and complex mechanisms of action that, while requiring further investigation, analysis, and validation, promise as well to have a broad impact on human disease.

Currently, there are relatively few methodologies specifically designed to accomplish the detection of pseudogene transcripts and tools that either replace or integrate manual annotation procedures are very much needed. In particular, it seems to us justified that we engage in advancing the computational treatment of pseudogenes at the whole transcriptome level. Catalogs of human pseudogenes have started to be delivered, through RNA-Seq technologies. However, just a certain number of transcriptomes has been covered. Furthermore, while most proposals have led to the production of a targeted algorithm, especially used for detection, few computational pipelines were designed following a comprehensive approach addressing identification and quantification of transcriptional activity within a unifying methodological frame.

Given the currently incomplete evidence, the limitations of the impacts due to the lack of extensive testing, and the presence of unsolved uncertainties affecting the reproducibility of results, our motivation for the proposal of a new computational approach is high and timely. We have considered a hybrid approach, based on the assembly of a variety of computational tools, including RNA-Seq methods and machine learning applications, all applied to transcriptome data of various complexities. Our initial strategy is to provide lists of pseudogenes to be validated against the currently known examples, in order to extend our knowledge further. An ultimate goal that is naturally linked to this work is to provide an automatic approach that analyzes transcriptomes with the goal of detecting candidate pseudogenes through characteristic features and that allows efficient and reproducible pseudogene classification models.

Key words

Pseudogenes Transcriptomes RNA-Seq Feature matrix Predictive inference 



EC thanks Laura Poliseno for fruitful discussions on the topic and for introducing to stimulating readings.


  1. 1.
    Tang F, Lao K, Surani MA (2011) Development and applications of single cell transcriptome analysis. Nat Methods 8(4 Suppl):S6–S11PubMedCentralPubMedGoogle Scholar
  2. 2.
    Jacquier A (2009) The complex eukaryotic transcriptome: unexpected pervasive transcription and novel small RNAs. Nat Rev Genet 10(12):833–844PubMedCrossRefGoogle Scholar
  3. 3.
    Clark MB, Amaral PP, Schlesinger FJ, Dinger ME, Taft RJ et al (2011) The reality of pervasive transcription. PLoS Biol 9(7):e1000625PubMedCentralPubMedCrossRefGoogle Scholar
  4. 4.
    Huarte M, Rinn JL (2010) Large non-coding RNAs: missing links in cancer? Hum Mol Genet 19(R2):R152–R161PubMedCentralPubMedCrossRefGoogle Scholar
  5. 5.
    Ponjavic J, Ponting CP, Lunter G (2007) Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res 17(5):556–565PubMedCentralPubMedCrossRefGoogle Scholar
  6. 6.
    Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, Cabili MN, Jaenisch R, Mikkelsen TS, Jacks T, Hacohen N, Bernstein BE, Kellis M, Regev A, Rinn JL, Lander ES (2009) Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458(7235):223–227PubMedCentralPubMedCrossRefGoogle Scholar
  7. 7.
    Khalil AM, Guttman M, Huarte M, Garber M, Raj A, Rivea Morales D, Thomas K, Presser A, Bernstein BE, van Oudenaarden A, Regev A, Lander ES, Rinn JL (2009) Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A 106(28):11667–11672PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Ørom UA, Derrien T, Beringer M, Gumireddy K, Gardini A, Bussotti G, Lai F, Zytnicki M, Notredame C, Huang Q, Guigo R, Shiekhattar R (2010) Long noncoding RNAs with enhancer-like function in human cells. Cell 143(1):46–58PubMedCrossRefGoogle Scholar
  9. 9.
    Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628PubMedCrossRefGoogle Scholar
  10. 10.
    Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63PubMedCentralPubMedCrossRefGoogle Scholar
  11. 11.
    Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Balakirev L, Ayala F (2003) Pseudogenes: are they “junk” or functional DNA? Annu Rev Genet 37:123–151PubMedCrossRefGoogle Scholar
  13. 13.
    Zheng D, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, Lu Y, Denoeud F, Antonarakis SE, Snyder M, Ruan Y, Wei C-L, Thomas R, Gingeras TR, Guigó R, Harrow J, Gerstein MB (2007) Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res 17(6):839–851PubMedCentralPubMedCrossRefGoogle Scholar
  14. 14.
    Pei B, Sisu C, Frankish A, Howald C, Habegger L, Mu JX, Harte R, Balasubramanian S, Tanzer A, Mark Diekhans M, Reymond A, Hubbard TJ, Harrow J, Gerstein MB (2012) The GENCODE pseudogene resource. Genome Biol 13:R51PubMedCentralPubMedCrossRefGoogle Scholar
  15. 15.
    Kalyana-Sundaram S et al (2012) Expressed pseudogenes in the transcriptional landscape of human cancers. Cell 149(7):1622–1634PubMedCentralPubMedCrossRefGoogle Scholar
  16. 16.
    Muro EM, Mah N, Andrade-Navarro MA (2011) Functional evidence of post-transcriptional regulation by pseudogenes. Biochimie 93(11):1916–1921PubMedCrossRefGoogle Scholar
  17. 17.
    Poliseno L (2012) Pseudogenes: newly discovered players in human cancer. Sci Signal 5(242):re5PubMedCrossRefGoogle Scholar
  18. 18.
    Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN (2010) RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4):493–500PubMedCentralPubMedCrossRefGoogle Scholar
  19. 19.
    Tarazona S, García-Alcalde F, Dopazo J, Ferrer A, Conesa A (2011) Differential expression in RNA-seq: a matter of depth. Genome Res 21(12):2213–2223PubMedCentralPubMedCrossRefGoogle Scholar
  20. 20.
    Rung J, Brazma A (2013) Reuse of public genome-wide expression data. Nat Rev Genet 14(2):89–99PubMedCrossRefGoogle Scholar
  21. 21.
    Zhang Z, Carriero N, Zheng D, Karro J, Harrison PM, Gerstein M (2006) PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics 22(12):1437–1439PubMedCrossRefGoogle Scholar
  22. 22.
    van Baren MJ, Brent MR (2006) Iterative gene prediction and pseudogene removal improves genome annotation. Genome Res 16(5):678–685PubMedCentralPubMedCrossRefGoogle Scholar
  23. 23.
    Lam HYK, Khurana E, Fang G, Cayting P, Carriero N, Cheung K-H, Gerstein M (2009) Pseudofam: the pseudogene families database. Nucl Acids Res 37(Suppl 1):D738–D743PubMedCentralPubMedCrossRefGoogle Scholar
  24. 24.
    Kang HJ et al (2011) Spatio-temporal transcriptome of the human brain. Nature 478(7370):483–489PubMedCentralPubMedCrossRefGoogle Scholar
  25. 25.
    Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12(10):671–682PubMedCrossRefGoogle Scholar
  26. 26.
    Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515PubMedCentralPubMedCrossRefGoogle Scholar
  27. 27.
    Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J, Adiconis X, Fan L, Koziol MJ, Gnirke A, Nusbaum C, Rinn JL, Lander ES, Regev A (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28(5):503–510PubMedCentralPubMedCrossRefGoogle Scholar
  28. 28.
    Robertson G et al (2010) De novo assembly and analysis of RNA-seq data. Nat Meth 7(11):909–912CrossRefGoogle Scholar
  29. 29.
    Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18(9):1509–1517, Date publishedPubMedCentralPubMedCrossRefGoogle Scholar
  30. 30.
    Licatalosi DD, Darnell RB (2010) RNA processing and its regulation: global insights into biological networks. Nat Rev Genet 11(1):75–87PubMedCentralPubMedCrossRefGoogle Scholar
  31. 31.
    Flicek P et al (2013) Ensembl 2013. Nucleic Acids Res 41(Database):D48–D55PubMedCentralPubMedCrossRefGoogle Scholar
  32. 32.
    Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9):1105–1111PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Center for Computational ScienceUniversity of MiamiMiamiUSA
  2. 2.Laboratory of Integrative Systems Medicine (LISM)CNR, Institute of Clinical PhysiologyPisaItaly

Personalised recommendations