CRISPR Detection from Short Reads Using Partial Overlap Graphs
Clustered regularly interspaced short palindromic repeats (CRISPR) are structured regions in bacterial and archaeal genomes, which are part of an adaptive immune system against phages. Most of the automated tools that detect CRISPR loci rely on assembled genomes. However, many assemblers do not successfully handle repetitive regions. The first tool to work directly on raw sequence data is Crass, which requires that reads are long enough to contain two copies of the same repeat. We developed a method to identify CRISPR repeats from a raw sequence data of short reads. The algorithm is based on an observation differentiating CRISPR repeats from other types of repeats, and it involves a series of partial constructions of the overlap graph. A preliminary implementation of the algorithm shows good results and detects CRISPR repeats in cases where other tools fail to do so.
KeywordsCRISPR Detection Overlap graph Partial overlap graph Sampling Filtering \(k\)-mer counting
Unable to display preview. Download preview PDF.
- 2.Ishino, Y., Shinagawa, H., Makino, K., Amemura, M., Nakata, A.: Nucleotide sequence of the iap gene, responsible for alkaline phosphatase isozyme conversion in Escherichia coli, and identification of the gene product. J. Bacteriol. 169, 5429–5433 (1987)Google Scholar
- 13.CRISPRs web server. http://crispr.u-psud.fr/