Skip to main content

Identifying SNPs without a Reference Genome by Comparing Raw Reads

  • Conference paper
String Processing and Information Retrieval (SPIRE 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6393))

Included in the following conference series:


Next generation sequencing (NGS) technologies are being applied to many fields of biology, notably to survey the polymorphism across individuals of a species. However, while single nucleotide polymorphisms (SNPs) are almost routinely identified in model organisms, the detection of SNPs in non model species remains very challenging due to the fact that almost all methods rely on the use of a reference genome. We address here the problem of identifying SNPs without a reference genome. For this, we propose an approach which compares two sets of raw reads. We show that a SNP corresponds to a recognisable pattern in the de Bruijn graph built from the reads, and we propose algorithms to identify these patterns, that we call mouths. We outline the potential of our method on real data. The method is tailored to short reads (typically Illumina), and works well even when the coverage is low where it reports few but highly confident SNPs. Our program, called KisSnp, can be downloaded here: .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. E. coli long-term experimental evolution project site,

  2. Barrick, J.E., Yu, D.S., Jeong, H., Oh, T.K., Schneider, D., Lenski, R.E., Kim, J.F.: Genome evolution and adaptation in a long-term experiment with escherichia coli. Nature 461, 1243–1247 (2009)

    Article  Google Scholar 

  3. Cannon, C., Kua, C.-S., Zhang, D., Harting, J.: Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack. Molecular Ecology 19(Suppl. 1) ,147–161 (2010)

    Google Scholar 

  4. Cooper, D.N., Smith, B.A., Cooke, H.J., Niemann, S., Schmidtke, J.: An estimate of unique DNA sequence heterozygosity in the human genome. Hum. Genet. 69, 201–205 (1985)

    Article  Google Scholar 

  5. Kimura, M.: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120 (1980)

    Article  Google Scholar 

  6. Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18(11), 1851–1858 (2008)

    Article  Google Scholar 

  7. Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., Li, S., Yang, H., Wang, J., Wang, J.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20(2), 265–272 (2010)

    Article  Google Scholar 

  8. Pevzner, P., Tang, H., Waterman, M.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. 98, 9748–9753 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  9. Ratan, A., Zhang, Y., Hayes, V., Schuster, S., Miller, W.: Calling SNPs without a reference genome. BMC Bioinformatics 11, 130 (2010)

    Article  Google Scholar 

  10. Richter, D., Ott, F., Auch, A., Schmid, R., Huson, D.: MetaSim – A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3(10), e3373 (2008)

    Article  Google Scholar 

  11. Zerbino, D., Birney, E.: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Peterlongo, P., Schnel, N., Pisanti, N., Sagot, MF., Lacroix, V. (2010). Identifying SNPs without a Reference Genome by Comparing Raw Reads. In: Chavez, E., Lonardi, S. (eds) String Processing and Information Retrieval. SPIRE 2010. Lecture Notes in Computer Science, vol 6393. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16320-3

  • Online ISBN: 978-3-642-16321-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics