Mapping-Free and Assembly-Free Discovery of Inversion Breakpoints from Raw NGS Reads

  • Claire Lemaitre
  • Liviu Ciortuz
  • Pierre Peterlongo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8542)


We propose a formal model and an algorithm for detecting inversion breakpoints without a reference genome, directly from raw NGS data. This model is characterized by a fixed size topological pattern in the de Bruijn Graph. We describe precisely the possible sources of false positives and false negatives and we additionally propose a sequence-based filter giving a good trade-off between precision and recall of the method. We implemented these ideas in a prototype called TakeABreak. Applied on simulated inversions in genomes of various complexity (from E. coli to a human chromosome dataset), TakeABreak provided promising results with a low memory footprint and a small computational time.


structural variant NGS reference-free de Bruijn graph 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alkan, C., Coe, B.P., Eichler, E.E.: Genome structural variation discovery and genotyping. Nat Rev. Genet. 12, 363–376 (2011)CrossRefGoogle Scholar
  2. 2.
    Chikhi, R., Rizk, G.: Space-efficient and exact de bruijn graph representation based on a bloom filter. Algorithms for Molecular Biology 8, 22 (2013)CrossRefGoogle Scholar
  3. 3.
    Drezen, E., et al.: The Genome Assembly and Analysis Tool Box, (Manuscript in Prep. 2014)
  4. 4.
    Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., McVean, G.: De novo assembly and genotyping of variants using colored de bruijn graphs. Nature Genetics 44, 226–232 (2012)CrossRefGoogle Scholar
  5. 5.
    Lemaitre, C., et al.: MindTheGap Software, (Manuscript in Prep. 2014)
  6. 6.
    Li, Y., Zheng, H., Luo, R., Wu, H., Zhu, H., Li, R., et al.: Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nat. Biotechnol. 29, 723–730 (2011)CrossRefGoogle Scholar
  7. 7.
    Medvedev, P., Stanciu, M., Brudno, M.: Computational methods for discovering structural variation with next-generation sequencing. Nat Methods 6, S13–S20 (2009)Google Scholar
  8. 8.
    Mills, R.E., Walter, K., Stewart, C., Handsaker, R.E.: 1000 Genomes Project: Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011)CrossRefGoogle Scholar
  9. 9.
    Nordström, K.J.V., Albani, M.C., James, G.V., et al.: Mutation identification by direct comparison of whole-genome sequencing data from mutant and wild-type individuals using k-mers. Nature Biotechnology 31, 325–330 (2013)CrossRefGoogle Scholar
  10. 10.
    Peterlongo, P., Schnel, N., Pisanti, N., Sagot, M.-F., Lacroix, V.: Identifying sNPs without a reference genome by comparing raw reads. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 147–158. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Sacomoto, G.A., Kielbassa, J., Chikhi, R., Uricaru, R., et al.: Kissplice: de-novo calling alternative splicing events from rna-seq data. BMC Bioinformatics 13, S5 (2012)Google Scholar
  12. 12.
    Salikhov, K., Sacomoto, G., Kucherov, G.: Using Cascading Bloom Filters to Improve the Memory Usage for de Brujin Graphs. In: Darling, A., Stoye, J. (eds.) WABI 2013. LNCS, vol. 8126, pp. 364–376. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  13. 13.
    Uricaru, R., et al.: discoSnp Software, (Manuscript in Prep. 2014)
  14. 14.
    Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome Research 18, 821–829 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Claire Lemaitre
    • 1
  • Liviu Ciortuz
    • 1
    • 2
  • Pierre Peterlongo
    • 1
  1. 1.INRIA/IRISA/GenScaleRennes cedexFrance
  2. 2.Faculty of Computer Science IasiRomania

Personalised recommendations