Identifying SNPs without a Reference Genome by Comparing Raw Reads

  • Pierre Peterlongo
  • Nicolas Schnel
  • Nadia Pisanti
  • Marie-France Sagot
  • Vincent Lacroix
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6393)


Next generation sequencing (NGS) technologies are being applied to many fields of biology, notably to survey the polymorphism across individuals of a species. However, while single nucleotide polymorphisms (SNPs) are almost routinely identified in model organisms, the detection of SNPs in non model species remains very challenging due to the fact that almost all methods rely on the use of a reference genome. We address here the problem of identifying SNPs without a reference genome. For this, we propose an approach which compares two sets of raw reads. We show that a SNP corresponds to a recognisable pattern in the de Bruijn graph built from the reads, and we propose algorithms to identify these patterns, that we call mouths. We outline the potential of our method on real data. The method is tailored to short reads (typically Illumina), and works well even when the coverage is low where it reports few but highly confident SNPs. Our program, called KisSnp, can be downloaded here: .


Execution Time Next Generation Sequencing Reference Genome Sequencing Error Neisseria Meningitidis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    E. coli long-term experimental evolution project site,
  2. 2.
    Barrick, J.E., Yu, D.S., Jeong, H., Oh, T.K., Schneider, D., Lenski, R.E., Kim, J.F.: Genome evolution and adaptation in a long-term experiment with escherichia coli. Nature 461, 1243–1247 (2009)CrossRefGoogle Scholar
  3. 3.
    Cannon, C., Kua, C.-S., Zhang, D., Harting, J.: Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack. Molecular Ecology 19(Suppl. 1) ,147–161 (2010)Google Scholar
  4. 4.
    Cooper, D.N., Smith, B.A., Cooke, H.J., Niemann, S., Schmidtke, J.: An estimate of unique DNA sequence heterozygosity in the human genome. Hum. Genet. 69, 201–205 (1985)CrossRefGoogle Scholar
  5. 5.
    Kimura, M.: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120 (1980)CrossRefGoogle Scholar
  6. 6.
    Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18(11), 1851–1858 (2008)CrossRefGoogle Scholar
  7. 7.
    Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., Li, S., Yang, H., Wang, J., Wang, J.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20(2), 265–272 (2010)CrossRefGoogle Scholar
  8. 8.
    Pevzner, P., Tang, H., Waterman, M.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. 98, 9748–9753 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Ratan, A., Zhang, Y., Hayes, V., Schuster, S., Miller, W.: Calling SNPs without a reference genome. BMC Bioinformatics 11, 130 (2010)CrossRefGoogle Scholar
  10. 10.
    Richter, D., Ott, F., Auch, A., Schmid, R., Huson, D.: MetaSim – A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3(10), e3373 (2008)CrossRefGoogle Scholar
  11. 11.
    Zerbino, D., Birney, E.: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Pierre Peterlongo
    • 1
  • Nicolas Schnel
    • 1
  • Nadia Pisanti
    • 2
  • Marie-France Sagot
    • 3
  • Vincent Lacroix
    • 3
  1. 1.INRIA Rennes - Bretagne Atlantique, EPI SymbioseRennesFrance
  2. 2.Dipartimento di InformaticaUniversità di PisaItaly
  3. 3.INRIA Rhône-Alpes, 38330 Montbonnot Saint-Martin, France and Université de Lyon, F-69000 Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie EvolutiveVilleurbanneFrance

Personalised recommendations