Mapping RNA-seq Data to a Transcript Graph via Approximate Pattern Matching to a Hypertext
Graphs are the most suited data structure to summarize the transcript isoforms produced by a gene. Such graphs may be modeled by the notion of hypertext, that is a graph where nodes are texts representing the exons of the gene and edges connect consecutive exons of a transcript. Mapping reads obtained by deep transcriptome sequencing to such graphs is crucial to compare reads with an annotation of transcript isoforms and to infer novel events due to alternative splicing at the exonic level.
In this paper, we propose an algorithm based on Maximal Exact Matches that efficiently solves the approximate pattern matching of a pattern P to a hypertext H. We implement it into Splicing Graph ALigner (SGAL), a tool that performs an accurate mapping of RNA-seq reads against a graph that is a representation of annotated and potentially new transcripts of a gene. Moreover, we performed an experimental analysis to compare SGAL to a state-of-art tool for spliced alignment (STAR), and to identify novel putative alternative splicing events such as exon skipping directly from mapping reads to the graph. Such analysis shows that our tool is able to perform accurate mapping of reads to exons, with good time and space performance.
The software is freely available at https://github.com/AlgoLab/galig.
KeywordsApproximate sequence analysis Next-generation sequencing Alternative splicing Graph-based alignment
We thank the anonymous reviewers for their insightful comments.
- 4.Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, 2nd edn. (2001)Google Scholar
- 12.Manber, U., Wu, S.: Approximate string matching with arbitrary costs for text and hypertext. In: Proceedings of the IAPR International Workshop on Structural and Syntactic Pattern Recognition, pp. 22–33 (1993)Google Scholar
- 15.Rhoads, A., Au, K.F.: PacBio sequencing and its applications. Genomics Proteomics Bioinform. 13(5), 278–289 (2015). sI: Metagenomics of Marine EnvironmentsGoogle Scholar
- 16.Sirén, J.: Indexing variation graphs. CoRR abs/1604.06605 (2016)Google Scholar