Hidden Treasures in Contemporary RNA Sequencing

  • Serghei Mangul
  • Harry Taegyun Yang
  • Eleazar Eskin
  • Noah Zaitlen

Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)

Table of contents

  1. Front Matter
    Pages i-v
  2. Serghei Mangul, Harry Taegyun Yang, Eleazar Eskin, Noah Zaitlen
    Pages 1-93

About this book


Advances in RNA-sequencing (RNA-seq) technologies have provided an unprecedented opportunity to explore the gene expression landscape across individuals, tissues, and environments by efficiently profiling the RNA sequences present in the samples. When a reference genome sequence or a transcriptome of the sample is available, mapping-based RNA-seq analysis protocols align the RNA-seq reads to the reference sequences, identify novel transcripts, and quantify the abundance of expressed transcripts.
The reads that fail to map to the human reference, known as unmapped reads, are a large and often overlooked output of standard RNA-seq analyses. Even in carefully executed experiments, the unmapped reads can comprise a considerable fraction of the complete set of reads produced, and can arise due to technical sequencing produced by low-quality and error-prone copies of the nascent RNA sequence being sampled. Reads can also remain unmapped due to unknown transcripts, recombined B and T cell receptor sequences, A-to-G mismatches from A-to-I RNA editing, trans-splicing, gene fusion, circular RNAs, and the presence of non-host RNA sequences (e.g. bacterial, fungal, and viral organisms). Unmapped reads represent a rich resource for the study of B and T cell receptor repertoires and the human microbiome system—without incurring the expense of additional targeted sequencing.
This book introduces and describes the Read Origin Protocol (ROP), a tool that identifies the origin of both mapped and unmapped reads. The protocol first identifies human reads using a standard high-throughput algorithm to map them onto a reference genome and transcriptome. After alignment, reads are grouped into genomic (e.g. CDS, UTRs, introns) and repetitive (e.g. SINEs, LINEs, LTRs) categories. The rest of the ROP protocol characterizes the remaining unmapped reads, which failed to map to the human reference sequences.


Next-Generation Sequencing RNA Sequencing Transcriptomics Read Mapping Circular RNAs Gene Fusions Immune Receptor Repertoire Microbial Communities Virome

Authors and affiliations

  • Serghei Mangul
    • 1
  • Harry Taegyun Yang
    • 2
  • Eleazar Eskin
    • 3
  • Noah Zaitlen
    • 4
  1. 1.Department of Computer Science, Institute for Quantitative and Computational BiosciencesUniversity of California Los AngelesLos AngelesUSA
  2. 2.Department of Computer ScienceUniversity of California Los AngelesLos AngelesUSA
  3. 3.Department of Computer Science, Department of Human GeneticsUniversity of California Los AngelesLos AngelesUSA
  4. 4.Division of Pulmonary, Critical Care, Sleep and Allergy, Department of Medicine, Cardiovascular Research InstituteUniversity of CaliforniaSan FranciscoUSA

Bibliographic information

  • DOI
  • Copyright Information The Author(s), under exclusive license to Springer Nature Switzerland AG 2019
  • Publisher Name Springer, Cham
  • eBook Packages Computer Science Computer Science (R0)
  • Print ISBN 978-3-030-13972-8
  • Online ISBN 978-3-030-13973-5
  • Series Print ISSN 2191-5768
  • Series Online ISSN 2191-5776
  • Buy this book on publisher's site